AdaBoost Model
This file creates and trains the AdaBoost metadata classification model. The AdaBoost classification model performs hyperparameter tuning over the number of estimators to be used within the ensemble method. The number of estimators experimented over is within the range of [1, 201] using an increment of 20. The process makes use of 5-fold cross-validation to evaluate the models performance for each model. A best-model save policy is enforced, using mean balanced accuracy as the evaluating metric.
Attributes:
Name | Type | Description |
---|---|---|
root_path |
str
|
The path to the project root. |
data_destination |
str
|
The path to where the decision tree model and its training accuracy is saved. |
adaboost_process(df, taxon_target, model_name, score_file, validation_file)
This method specified the XGBoost modelling process
Specifically this method, calls the required pipeline (Decision tree pipeline) to generate the features and labels required for training. Then, calls the training process to use the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe containing all data for each observation. |
required |
taxon_target |
str
|
The taxonomic target level, to extract the correct labels (taxon_family_name, taxon_genus_name, taxon_species_name, subspecies) |
required |
model_name |
str
|
The name of the model type being trained. In this case 'Adaboost'. |
required |
score_file |
str
|
The filename of where the training data will be stored. This will have the same location as where the model is saved. |
required |
validation_file |
str
|
The name of the file where the validation data will be stored. Also informs the name of the saved models. |
required |
Source code in src/models/meta/adaboost_model.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
train_adaboost(X, y, model_name, score_file)
This method performs the Adaboost training and hyperparameter tuning.
Hyperparameter tuning aims to determine the optimal number of estimators to be used within the Adaboost model, for each classifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input features to the decision tree |
required |
y |
Series
|
The categorical taxonomic labels of the corresponding observations to the features. |
required |
model_name |
str
|
The name of the model type being trained. In this case 'XGBoost'. |
required |
score_file |
str
|
The filename of where the training data will be stored. |
required |
Source code in src/models/meta/adaboost_model.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|