Metadata Model Training
This file performs metadata model training for all proposed models, at all taxonomic levels.
The metadata model training is automated to train the proposed models (Decision Tree, Random Forest, Adaboost, XGBoost, and a Neural Network) at each taxonomic parent node of the dataset. This forms five cascading taxonomic classifiers, with a model at each taxonomic parent node. This enables comparison of the models at each taxonomic level, to determine the most robust and optimal model to use as a metadata classifier.
Attributes:
Name | Type | Description |
---|---|---|
model_abbreviations |
dict
|
A dictionary containing the names of the classification models as keys, and their abbreviations as values. |
model_save_types |
dict
|
A dictionary containing the names of the classification models as keys, and their relevant file types when saved. |
file_name_taxon |
dict
|
A dictionary containing the taxonomic level indicators in the dataset, and their relevant abbreviations to be used in file naming. |
dataset_iteration(observation_file, metadata_file)
This method is performs the full metadata training for all models at all available taxonomic levels for the provided dataset. Only a single dataset is trained at a time
The information printed out, is to be used within the model_comparison.ipynb
to direct the model validation and figure construction.
For more information, please review the model_comparison
notebook.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observation_file |
str
|
The name of the observation files within the |
required |
metadata_file |
str
|
The name of the metadata files within the |
required |
Source code in src/models/meta/model_training.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
generate_file_name_start(restriction)
Method standardizes the parent taxonomic restriction to create a suitable filename for each model
This method removes white space, replacing it with an underscore, and ensures the name is all lower case.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
restriction |
str
|
The label of the taxonomic parent node (restriction) |
required |
Returns:
Type | Description |
---|---|
str
|
A standardized form of the restriction. |
Source code in src/models/meta/model_training.py
191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
model_iteration(observation_file, metadata_file)
Method performs the model iteration per dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observation_file |
str
|
The processed iNaturalist observations dataset. |
required |
metadata_file |
str
|
The corresponding metadata for the observation file. |
required |
Returns:
Name | Type | Description |
---|---|---|
models |
list
|
The list of all models iterated over. |
model_name_collection |
list
|
The list of all model names produced during the iteration of the dataset. |
taxon_target_collection |
list
|
The list of all taxonomic targets iterated through within this dataset. |
abbreviations_collection |
list
|
A list of the corresponding model abbreviations |
Source code in src/models/meta/model_training.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
model_selection_execution(model, df, target_taxon, model_name, training_history, validation_file)
This method allows multiple models to be trained through the specification of the model type, and the subsequent execution of the required data pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
str
|
Specification of the model to be used to classify the taxonomic child nodes. |
required |
df |
DataFrame
|
The combined observation and metadata dataframe with the taxonomic parent node restriction applied. Only taxonomic child labels are present in df. |
required |
target_taxon |
str
|
Specification of the taxonomic level of the taxon child nodes (not the taxonomic level of the parent node) |
required |
model_name |
str
|
The complete model name (parent taxon label and model abbreviation make the combined name unique) |
required |
training_history |
str
|
File name at which to save the model training history. |
required |
validation_file |
str
|
File name at which to save the validation dataset. |
required |
Returns:
Type | Description |
---|---|
None
|
This method returns nothing. The return statement was used to ensure case stopping. |
Source code in src/models/meta/model_training.py
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|
model_simplification(df, model, target_taxon, model_save_type, file_name_start)
This method simplifies the model training, testing, and saving process. It completes a full model training, testing, and saving for the specified model and dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The combined observation and metadata dataframe with the taxonomic parent node restriction applied. Only taxonomic child labels are present in df. |
required |
model |
str
|
Specification of the model to be used to classify the taxonomic child nodes. |
required |
target_taxon |
str
|
Specification of the taxonomic level of the taxon child nodes (not the taxonomic level of the parent node. |
required |
model_save_type |
str
|
Specification of the model file type/ model suffix. |
required |
file_name_start |
str
|
The standardized taxon parent node label which will be used to construct a unique file name for the trained model. |
required |
Source code in src/models/meta/model_training.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
|
taxonomic_analysis(df)
This method performs the taxonomic breakdown of the dataset at the following taxonomic levels: taxon_family_name, taxon_genus_name, taxon_species_name, subspecies
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe containing the unrestricted observations and metadata to perform a taxonomic breakdown of the entire dataset |
required |
Returns:
Type | Description |
---|---|
dict
|
Keys specify the taxonomic level and the values are a list containing all unique labels in the dataset, forming a taxonomic breakdown |
Source code in src/models/meta/model_training.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
taxonomic_level_modelling(observation_file, metadata_file, model)
This method performs a taxonomic level breakdown and training at all taxonomic levels for the specified model.
This method performs the dataset taxonomic restriction at the parent node, modifying the dataset to fit each taxonomic parent node, such that only the taxonomic children of the parent node are within the dataset. This is done for the entire taxonomic structure within the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observation_file |
str
|
The processed iNaturalist observations dataset. |
required |
metadata_file |
str
|
The corresponding metadata for the observation file. |
required |
model |
str
|
String specification of the model to be trained ane evaluated. |
required |
Returns:
Name | Type | Description |
---|---|---|
models |
list
|
A list of file names, where the file name specified the taxonomic parent node (model classifies the taxonomic children) |
taxon_targets |
list
|
Species the list of the taxonomic target levels in the same order as the models list. |
Source code in src/models/meta/model_training.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
train_base_model(model, target_taxon, file_name='base_meta')
This method trains the root node of the taxonomic tree.
The current model training requires the Felid and Elephant datasets to be kept separate to train all of their relevant taxonomic models. This however excludes the root classifier to determine between the two taxon families. This method ensures the taxonomic root is trained. Note, this method can be used to train a metadata global classifier by specifying the target taxom to the species level.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
str
|
Specification of the model to be used to classify the taxonomic child nodes. |
required |
target_taxon |
str
|
Specification of the taxonomic level of the taxon child nodes (not the taxonomic level of the parent node) |
required |
file_name |
str
|
The file name of the root classification model. |
'base_meta'
|
Source code in src/models/meta/model_training.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
|