Code Documentation
This file performs the binary image labelling.
The file allows you to tailor the binary image labelling to suite the use case by adapting specific variables. Please consult the README or Documentation for further information.
Attributes:
Name | Type | Description |
---|---|---|
root_path |
str
|
The absolute path of the project root. |
data_path |
str
|
The complete path (absolute + relative) to the project |
observation_path |
str
|
The complete path to the |
labelled_path |
str
|
The complete path to the |
labelled_file |
str
|
The filename of where the labelling history is going to be collected. |
image_path |
str
|
The complete path to where the images to be labelled are found. |
labelled_image_path |
str
|
The complete path to the directory where the labelled images are saved in binary directories to seperate the classes after being labelled. |
binary_labels |
dict
|
Linking of the numerical key values to the expected labels. An |
positive_count |
int
|
The count of the number of positive labels recorded |
negative_count |
int
|
The count of the number of negative labels recorded |
ignore_class |
str
|
The variable enables specification of a class to ignore recording labels for. Review the README or documentation for more information. |
aggregate_datasets(datasets)
This method aggregates the specified dataset list into a single dataframe for further use.
Note, this method can be used if the user requires the images to be matched to a dataset. In the current format, the labelling process only requires the images. This method is in place to offer the capability to extend the labelling process if required.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
list
|
A list of dataset file names. These will be aggregated into a single dataframe. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A single dataframe comprising the aggregated datasets. |
Source code in main.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
avoid_duplicate_images(filenames)
This method removes images from filenames, that have already been processed to avoid repeated work.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filenames |
list
|
The list of all image filenames to be labelled. |
required |
Returns:
Type | Description |
---|---|
list
|
A list of filenames, with those already labelled removed. |
Source code in main.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
copy_to_labelled_images(filename, label)
This method copies the labelled image to the corresponding directory within data/labelled/images/
Note, the binary directories will be created automatically based on the categorical names provided in the code.
In the end, the data/labelled/images/
directory will contain two additional directories housing images of each class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
The name of the file to be copied into the labelled images directory. |
required |
label |
str
|
The corresponding categorical label of the image. |
required |
Source code in main.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
display_image(filename)
This method displays the image specified by the filename.
The filename of the image is assumed to be located in the data/images/
directory.
The image display will close upon the click of the button to label the image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
The file name of the image to be displayed. |
required |
Returns:
Type | Description |
---|---|
int
|
An integer encoding of the key pressed. |
Source code in main.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
labelling_process()
This method controls the image labelling process.
In summary, the process is as follows:
The data/images/
directory holds all of the images to be labelled.
A check is conducted to ensure no images are repeatedly labelled.
Each image is labelled, the image is copied into a corresponding directory.
A history of each image filename and its corresponding label is maintained.
As a result, there exists a labelled_file.csv
containing the labelling history.
Additionally, within the data/labelled/images/
directory there exist two directories housing the binary labelled images.
Source code in main.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
remove_already_processed_observations(df)
This method removes the already labelled observations from the dataset.
The method accesses the already labelled dataset and extracts the unique observation IDs. It removes the IDs if they are present in the current dataset to avoid repetition.
Additionally, the method updates the positive and negative counts to keep track of the number of each binary label in the labelled dataset.
Note, this method can be used if the user requires the images to be matched to a dataset. In the current format, the labelling process only requires the images. This method is in place to offer the capability to extend the labelling process if required.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The current dataset to still be labelled. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The dataframe is returned with already labelled observations removed from it. |
Source code in main.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
status_update(encoded_key)
This method updates the binary counts and displays the current counts to the terminal
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoded_key |
int
|
The encode key value (numerical representation of the key pressed) |
required |
Source code in main.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
|
update_binary_counts(df_labelled)
This method updates the binary counts based on the already labelled data.
This method updates the global binary counts of the file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_labelled |
DataFrame
|
The dataframe containing the already labelled observations. |
required |
Source code in main.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
write_to_file(filenames, labels)
This method writes the labelled files and their corresponding labels to the labelled_file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filenames |
list
|
A list of filenames with the corresponding labels in the same order as the labels list. |
required |
labels |
list
|
The categorical labels of the images. |
required |
Source code in main.py
197 198 199 200 201 202 203 204 205 |
|