Neural Network Model
This file creates and trains the neural network metadata classification model.
The neural network model performs learning rate hyperparameter tuning due to the variable levels of abstraction within the taxonomic tree. The training process makes use of 5-fold cross validation to evaluate the models performance for each hyperparameter, using balanced accuracy as the evaluating metric. A best-model save policy is enforced using the mean accuracy across the 5-folds.
Attributes:
Name | Type | Description |
---|---|---|
root_path |
str
|
The path to the project root. |
data_destination |
str
|
The path to where the neural network model and its training accuracy is saved. |
make_model(input_dimension, classes)
This method creates a new neural network model.
This neural network has the following architecture: A variable input due to the varying number of input features. Two densely interconnected layers of 80 and 60 neurons each with RELU activation functions. Finally, a softmax output layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_dimension |
int
|
The number of input features to the neural network |
required |
classes |
int
|
The number of classes that should be classified in the output layer. |
required |
Returns:
Type | Description |
---|---|
keras.Sequential
|
The model constructed in the specified architecture, with appropriate input and output dimensions. |
Source code in src/models/meta/neural_network_model.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
neural_network_process(df, taxon_target, model_name, score_file, validation_file)
This method specified the neural network modelling process
Specifically this method, calls the required pipeline (neural network pipeline) to generate the features and labels required for training. Then, calls the training process to use the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe containing all data for each observation. |
required |
taxon_target |
str
|
The taxonomic target level, to extract the correct labels (taxon_family_name, taxon_genus_name, taxon_species_name, subspecies) |
required |
model_name |
str
|
The name of the model type being trained. In this case 'Neural network'. |
required |
score_file |
str
|
The filename of where the training data will be stored. This will have the same location as where the model is saved. |
required |
validation_file |
str
|
The name of the file where the validation data will be stored. Also informs the name of the saved models. |
required |
Source code in src/models/meta/neural_network_model.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
train_neural_network(X, y, classes, model_name, score_file)
This method performs the neural network model training and hyperparameter tuning.
Hyperparameter tuning aims to determine the optimal learning rate for each classification model. The learning rates tuned over include: [0.1, 0.01, 0.001, 0.0001]. Learning rate was selected due to the varying levels of abstractions within the taxonomic cascading structure. This process makes use of a best-model save policy based on the mean categorical accuracy (balanced accuracy) evaluation metric.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input features to the decision tree |
required |
y |
Series
|
The categorical taxonomic labels of the corresponding observations to the features. |
required |
classes |
int
|
The number of unique classes for the model to classify |
required |
model_name |
str
|
The name of the model type being trained. In this case 'Neural network'. |
required |
score_file |
str
|
The filename of where the training data will be stored. |
required |
Source code in src/models/meta/neural_network_model.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
write_training_accuracy(filename, fold_histories, learning_rate)
This method writes the mean training and evaluation scores to a csv file for visualization and recording purposes.
Note, the data written is the mean 5-fold categorical accuracy at each epoch of training. This is written for each learning rate used, serving as hyperparameter tuning.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
The filename, where the training data will be saved. |
required |
fold_histories |
str
|
The mean 5-fold categorical accuracy for each epoch of training for all models trained. |
required |
learning_rate |
str
|
The learning rate applied to the trained and evaluated model. |
required |
Source code in src/models/meta/neural_network_model.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|