Taxonomic Directory Structure Documentation
This file creates a taxonomic directory structure, allocating the cropped images into the correct taxonomic level and directory.
The taxonomic directory is required for the image classification model training in the Wildlife Classification repository.
It makes use of the image_dataset_from_directory()
function to construct images with the corresponding labels based on the directory structure.
Attributes:
Name | Type | Description |
---|---|---|
root_path |
str
|
The absolute path to the root of the project directory. |
cropped_img_path |
str
|
The path to the sub-images of the observations |
img_path |
str
|
The path to the training directory where the images are arranged within the taxonomic directory. |
test_path |
str
|
The path to the validation directory where the images are arranged within the taxonomic directory. |
data_path |
str
|
The path to where the iNaturalist observations are stored. |
multiple_detections_id |
list
|
The possible suffixes of the sub-images used to identify how many sub-images exist per observation. |
img_size |
int
|
(528, 528) The size of the images accepted by the EfficientNet-B6 Classification model in Wildlife Classification. |
test_split |
float
|
The percentage of images to be placed in the validation directory (A 15% validation split) |
taxonomy_list |
list
|
The list of the taxonomic column names at which the taxon hierarchy will be based on. |
count |
int
|
The number of sub-images placed in the correct directory. |
length |
int
|
The total number of observations. |
create_dataset(observations)
This method creates a DataFrame from the specified observations list.
Note, multiple dataset files can be specified and the taxonomic structure will remain correct. The method removes any Null values from the image_urls, removes the erroneous Felis Catus species, and drops unnecessary columns.
Returns:
Type | Description |
---|---|
DataFrame
|
A dataframe of the specified observations file aggregated together. |
Source code in dataset_structure.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
image_access(x)
This method creates the taxonomic path based on the specified taxonomic levels. The resulting path indicates where in the taxon directory the image will be stored.
Additionally, this method splits the images into training and validation datasets.
This method is used in conjunction with the DataFrame.apply()
method and the lambda expression.
This method updates the status bar upon saving the sub-image into the correct directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Row
|
The row of the dataframe representing a single observation. |
required |
Source code in dataset_structure.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
multiple_obs(id, path)
This method searches for multiple sub-images per observation in order to place them at the same file path within the taxonomic directories.
This method copies the images from the images/cropped/
directory to the target file path specified.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id |
int
|
The unique observation id. |
required |
path |
str
|
The path to the correct taxon directory within the taxonomic directory structure. This path should be the same for all sub-images of an observation. |
required |
Source code in dataset_structure.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
status_bar_update()
This method updates the visual status bar to represent the status of the image download.
Source code in dataset_structure.py
160 161 162 163 164 165 166 167 168 169 170 |
|
sub_species_detection(x)
This method performs subspecies detection based on the information available in the scientific name column.
This method identifies a subspecies name by definition as containing three distinct words making up the scientific name (unique to subspecies)
This method is used in conjunction with the DataFrame.apply()
method and the lambda expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Row
|
The row of the dataframe representing a single observation. |
required |
Returns:
Type | Description |
---|---|
Row
|
The same row augmented to include an additional column titled sub_species. |
Source code in dataset_structure.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
taxonomic_analysis(df)
This method performs a taxonomic analysis, whereby it identifies each unique label at the specified taxonomic levels.
This method in this cause can be used to visualize the potential unique taxonomic labels, but is not required in the taxon directory process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe containing all observations. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary whereby the keys are the specified taxon levels, and the values are a list of unique taxon labels at each level. |
Source code in dataset_structure.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|