Image Classification Modelling
This file contains all methods needed to make, train, and evaluate the EfficientNet B6 model on the specified dataset.
This file requires manual specification of the taxonomic parent node to model. Due to the massive memory and computational requirements of training a large CNN, only a single model can be trained at a time.
Please review the Animal-Detector repository to create the required image directories in order to train the models.
This training process is structured to be run within a Docker container in order to train on a single GPU unit. Please review the documentation or README how to run the training and validation processes. For easy access here is the command to run the model training:
docker run --gpus all -u $(id -u):$(id -g) -v /path/to/project/root:/app/ -w /app -t ghcr.io/trav-d13/spatiotemporal_wildlife_classification/train_image:latest
Attributes:
Name | Type | Description |
---|---|---|
model_name |
str
|
The saved name of the model. The file name must have the following format. taxonomic name + _taxon_classifier. Example: |
img_path |
str
|
The path to the taxonomic parent node within the |
save_path |
str
|
The path to where the model will be saved. In this case the |
img_size |
int
|
The specified image size as input to the EfficientNet-B6 model (528) |
batch_size |
int
|
The number of images within a single batch (32) |
epochs |
int
|
The number of epochs in model training (25) |
construct_model(classes)
Method constructs an EfficientNet-B6 model to fit the specified number of classes.
The training makes use of transfer learning, so the EfficientNet-B6 model is created with ImageNet weights. The top softmax layer is removed from the original model and replaced with a Global Average Pooling 2D Layer, followed by a densely connected softmax layer classifying the specified number of classes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
classes |
int
|
Integer specifying the number of classes to be classified. Instructs the size of the softmax output layer. |
required |
Returns:
Type | Description |
---|---|
Model
|
The complete model ready to be trained. |
Source code in src/models/image/taxonomic_modelling.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
get_image_labels(ds, classes)
Method generates class names from the dataset. This helps generate class weightings
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds |
tf.data.Dataset
|
Either the train or test dataset which labels must be generated for. |
required |
classes |
list
|
A list of the class labels (alphabetically ordered). |
required |
Returns:
Type | Description |
---|---|
list
|
A list of labels in the provided dataset, in the same order as specified in the dataset. |
Source code in src/models/image/taxonomic_modelling.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
import_dataset(file_path)
This method imports the dataset from the proposed directory forming both a train and test set.
This method uses the image_dataset_from_directory() method. For more information please visit: https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory
This method, allows specification of the file path and automatically determines the labels based on the directory structure, hence the directory structure replicating the taxonomic tree of the dataset. Additionally, the class names are displayed when this method is called.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
str
|
The path from the |
required |
Returns:
Name | Type | Description |
---|---|---|
train_ds |
tf.data.Dataset
|
The training dataset which will be used to train the model |
val_ds |
tf.data.Dataset
|
The testing dataset used to test the trained model. |
Source code in src/models/image/taxonomic_modelling.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
plot_hist(hist, title)
This method plots the accuracy and the validation set accuracy over the number of epochs and saves the figure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hist |
dict
|
A dictionary containing the training accuracy and testing accuracies per epoch |
required |
Source code in src/models/image/taxonomic_modelling.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
setup_paths(file_name, dataset_path)
This method creates the correct file save and dataset access paths
This method directly modifies global path variables
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_name |
str
|
The file name must have the following format. taxonomic name + _taxon_classifier. Example: |
required |
dataset_path |
str
|
The path to the taxonomic parent node within the |
required |
Source code in src/models/image/taxonomic_modelling.py
245 246 247 248 249 250 251 252 253 254 255 256 257 |
|
train_model(file_name, dataset_path, visualize=False)
This model specifies the entire training process, and simplifies the model naming and dataset specification procedure for training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_name |
str
|
The file name must have the following format. taxonomic name + _taxon_classifier. Example: |
required |
dataset_path |
str
|
The path to the taxonomic parent node within the |
required |
visualize |
bool
|
A boolean value indicating whether the training and testing over the number of epochs should be visualized and saved in a figure. |
False
|
Source code in src/models/image/taxonomic_modelling.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
train_model_top_weights(model, train_ds, val_ds)
Perform CNN training on the unfrozen top weights of the model.
The dataset is weighted to achieve a balanced impact of each class on the model training. This is used to combat the long-tail distribution of the dataset. A best model save policy is created so only the best model from the training epochs is saved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Model
|
The crated and prepared EfficientNet-B6 model with all but the top layers frozen, for training on the provided dataset. |
required |
train_ds |
tf.data.Dataset
|
The image training dataset |
required |
val_ds |
tf.data.Dataset
|
The image test dataset |
required |
Returns:
Name | Type | Description |
---|---|---|
model |
Model
|
The trained model |
hist |
dict
|
The history of the model training process. |
Source code in src/models/image/taxonomic_modelling.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|