Ensemble
Cascading Ensemble Classifier
This file forms the novel ensemble classifier.
The novel ensemble classifier is a combination of two cascading taxonomic classification trees, using the metadata and image classifiers respectively at each parent node. The two trees, form a combined result which best evaluates the optimal prediction based on the classifiers strengths and when mitigating their weaknesses.
This classifier operates to classify the validation dataset. The results serve as a comparison against baseline traditional image flat-classification methodologies.
Please note, the hierarchy is hard coded due to the difficulty in getting the metadata and image classification models containing the same number of predicted children. This was hardcoded to achieve a result used to determine if this avenue of classification was worth pursuing. It is, so this method will need to be refined to be scalable, efficient, and distributed to be of use as a real-time classifier. For now, this script operates as a Proof of concept on the database. Adjust the hardcoded hierarchy if classifying wildlife on a different dataset.
Attributes:
Name | Type | Description |
---|---|---|
data_path |
str
|
The path to where the |
results_path |
str
|
The path to where the results are stored. The results are stored within |
image_path |
str
|
The path to the directory containing the validation images. The validation images and the data path sets are linked by observation id. |
model_path |
str
|
The path to the base directory containing image and metadata classification models. |
image_model_path |
str
|
The specific directory containing all image models (using |
meta_model_path |
str
|
The specific directory containing all metadata models (using |
cluster_model_path |
str
|
The specific directory containing all K-means model used to encode the observation locations. (using |
base_image_classifier_path |
str
|
The path to the base image classifier. (The root image classifier classifying Felidae and Elephantidae as child classes) |
base_meta_classifier_path |
(str
|
The path to the base metadata classifier. (The root metadata classifier classifying Felidae and Elephantidae as child classes) |
base_cluster_path |
str
|
The path to the base K-means cluster model. (The models encoding the Felidae and Elephantidae possitions at the Family taxon level) |
multiple_detections_id |
list
|
The list of possible image suffixes due to multiple sub-images per observation. |
img_size |
int
|
The size of the input images to the image classifier (528, 528, 3) |
taxonomic_levels |
list
|
The list of taxonomic levels at which classification occurs in the dataset from family to subspecies in order. |
hierarchy |
dict
|
The taxonomic breakdown of the dataset. |
taxon_weighting |
dict
|
The weighting of the metadata model predictions. The inverse presents the image classification predictions. These values are presented after observing the metadata and image classification taxonomic level experiment results. |
avg_multi_image_predictions(images, model)
This model averages the predictions of sub-images to produce a single prediction per observation.
This method combines the sub-image wildlife predictions together, averages them, and constrains them to a valid probability distribution to represent a softmax output, in order to provide a single prediction per observation (original image).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
list
|
A list of sub-image paths of which predictions will be combined. |
required |
model |
keras.Sequential
|
The image classification model to classify the sub-images as the correct taxonomic level. |
required |
Returns:
Type | Description |
---|---|
list
|
A summed, averaged, and constrained softmax output to provide a single image classification per observation. |
Source code in src/ensemble/ensemble.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 |
|
ensemble_iteration()
This method performs the cascading ensemble classification on the validation dataset.
Source code in src/ensemble/ensemble.py
568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 |
|
image_prediction(index, image_model)
This method handles the detection of sub-images and their mean image prediction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The observation id providing the unique based identifier for sub-images of the original observation. |
required |
image_model |
keras.Sequential
|
The image classification model, for the current taxon parent node, |
required |
Returns:
Type | Description |
---|---|
list
|
A summed, averaged, and constrained softmax output to provide a single image classification per observation. |
Source code in src/ensemble/ensemble.py
432 433 434 435 436 437 438 439 440 441 442 443 444 |
|
instantiate_save_file()
This method instantiates the file saving and documenting the predictions of the ensemble model, its components, and the true labels.
Returns:
Name | Type | Description |
---|---|---|
dictwriter |
DictWriter
|
A dictionary writer object enabling dictionaries to be written to file f. |
f |
file_handle
|
The file handle of the file to which the prediction data is being stored. |
Source code in src/ensemble/ensemble.py
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 |
|
load_next_cluster_model(decision)
This method handles the loading of the correct K-means model, based on the decision
Parameters:
Name | Type | Description | Default |
---|---|---|---|
decision |
str
|
The taxon label instructing the method which model to load. This is ordinarily the name of the taxonomic child. However "base" loads the root model. |
required |
Returns:
Type | Description |
---|---|
KMeans
|
The pre-trained K-means model encoding the locations of the child nodes. |
Source code in src/ensemble/ensemble.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
|
load_next_image_model(decision)
This method handles the loading of the correct image model, based on the provided decision
Parameters:
Name | Type | Description | Default |
---|---|---|---|
decision |
str
|
The taxon label instructing the method which model to load. This is ordinarily the name of the taxonomic child. However "base" loads the root model. |
required |
Returns:
Type | Description |
---|---|
keras.Sequential
|
The pre-trained EfficientNet-B6 model classifying the children of the provided decision. |
Source code in src/ensemble/ensemble.py
373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
|
load_next_meta_model(decision)
This method handles the loading of the correct metadata model, based on the provided decision
Parameters:
Name | Type | Description | Default |
---|---|---|---|
decision |
str
|
The taxon label instructing the method which model to load. This is ordinarily the name of the taxonomic child. However "base" loads the root model. |
required |
Returns:
Type | Description |
---|---|
xgboost
|
The pre-trained xgboost model classifying the children of the provided decision. |
Source code in src/ensemble/ensemble.py
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 |
|
metadata_prediction(X, index, meta_model)
This method handles the formatting of metadata and the model prediction
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame
|
The input features to the metadata model. |
required |
index |
int
|
The unique id of each observation. |
required |
meta_model |
xgb
|
The metadata model to predict the wildlife classes of the current taxon parent node. |
required |
Returns:
Type | Description |
---|---|
list
|
The metadata model softmax prediction. |
Source code in src/ensemble/ensemble.py
447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 |
|
multiple_image_detections(index)
This method gathers all sub-images per a single observation
Due to the image pre-processing multiple sub-images can occur. Each sub-image centers and focuses on a identified
wildlife individual. This method accumulated all file names based on the observation id.
Sub-images are structured in the following format: <id>_<alphabetical suffix>.jpg
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
This is the unique id value of each observation. |
required |
Returns:
Type | Description |
---|---|
list
|
A list of file names leading to sub-images of the specified observation id (index) |
Source code in src/ensemble/ensemble.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
|
predict(index, data)
This method performs a cascading prediction for the specified observation.
This method cascades from the family taxon to the subspecies taxon if labels are provided to that depth. The method documents the component, combined classifications, and true labels at each taxon level for analysis. The cascading process halts, when a miss-classification occurs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The unique id of the current observation to be classified. |
required |
data |
pd.DataFrame
|
The dataframe containing all observation and metadata. |
required |
Source code in src/ensemble/ensemble.py
464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 |
|
preprocess_meta_data(df, k_means, taxon_target)
This method processes and formats the metadata for model prediction, based on the taxon target the location encoding K-means model.
This processing pipeline is essential for each dataset, as it processes and prepares the data based on the taxonomic level and the models pre-trained to suit the taxon level. This method looks similar to the data pipelines, but is modified due to the validation dataset already being processed. This pipeline formats the data into the required form for use.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe to be processed into features and labels. |
required |
k_means |
KMeans
|
The pre-trained location encoding model. This is trained based on the appropriate taxon parent node and taxon target, to be most effective within this dataset. |
required |
taxon_target |
str
|
The taxonomic target level, to extract the correct labels (taxon_family_name, taxon_genus_name, taxon_species_name, subspecies) |
required |
Returns:
Name | Type | Description |
---|---|---|
X |
DataFrame
|
The dataframe of features to be used by the metadata models |
y |
Series
|
The labels of each observation, extracted at the taxonomic target level. |
Source code in src/ensemble/ensemble.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
read_position()
This method reads the position from the position.csv
as a way of keeping track of which observation it is
currently on in the validation dataset.
Returns:
Type | Description |
---|---|
int
|
The row location of the current observation to classify. |
Source code in src/ensemble/ensemble.py
545 546 547 548 549 550 551 552 553 554 |
|
taxon_weighted_decision(meta_prediction, image_prediction, taxon_level)
This method weights the metadata and image predictions based on the determined taxonomic weighting to produce a single softmax output to correctly predict the wildlife taxon
The output is constrained to a valid probability distribution to match a softmax output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
meta_prediction |
list
|
A list of metadata class probabilities. Note, the classes must match the image prediction |
required |
image_prediction |
list
|
A list of image class probabilities. Note, the classes must match the meta prediction. |
required |
taxon_level |
str
|
Specification of the taxon level, to specify the component weighting. (taxon_family_name, taxon_genus_name, taxon_species_name, sub_species) |
required |
Returns:
Type | Description |
---|---|
list
|
A constrained softmax output constructed from the weighted influence of the individual metadata and image prediction components. |
Source code in src/ensemble/ensemble.py
274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 |
|
update_models(label)
This method serves to update all three models, based on the provided taxon label
Parameters:
Name | Type | Description | Default |
---|---|---|---|
label |
str
|
The taxon label instructing the method which model to load. This is ordinarily the name of the taxonomic child. However "base" loads the root model. |
required |
Returns:
Name | Type | Description |
---|---|---|
meta_model |
KMeans
|
The metadata classification model for the label taxon parent node. |
image_model |
keras.Sequential
|
The image classification model for the label taxon parent node. |
cluster_model |
KMeans
|
The K-means encoding model for the children of the label taxon parent node. |
Source code in src/ensemble/ensemble.py
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 |
|
update_position(prev_position)
This method updates the saved row location of the observation to point to the next observation from previous.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prev_position |
int
|
The row location of the previous observation that has been classified. |
required |
Source code in src/ensemble/ensemble.py
557 558 559 560 561 562 563 564 565 |
|