Code Documentation

This file performs the binary image labelling.

The file allows you to tailor the binary image labelling to suite the use case by adapting specific variables. Please consult the README or Documentation for further information.

Attributes:

Name	Type	Description
`root_path`	`str`	The absolute path of the project root.
`data_path`	`str`	The complete path (absolute + relative) to the project `data` directory
`observation_path`	`str`	The complete path to the `data/observations/` directory. This directory contains the iNaturalist observations.
`labelled_path`	`str`	The complete path to the `data/labelled/` directory. This directory contains a record of the labelled observations.
`labelled_file`	`str`	The filename of where the labelling history is going to be collected.
`image_path`	`str`	The complete path to where the images to be labelled are found.
`labelled_image_path`	`str`	The complete path to the directory where the labelled images are saved in binary directories to seperate the classes after being labelled.
`binary_labels`	`dict`	Linking of the numerical key values to the expected labels. An `ignore` label is also provided.
`positive_count`	`int`	The count of the number of positive labels recorded
`negative_count`	`int`	The count of the number of negative labels recorded
`ignore_class`	`str`	The variable enables specification of a class to ignore recording labels for. Review the README or documentation for more information.

`aggregate_datasets(datasets)`

This method aggregates the specified dataset list into a single dataframe for further use.

Note, this method can be used if the user requires the images to be matched to a dataset. In the current format, the labelling process only requires the images. This method is in place to offer the capability to extend the labelling process if required.

Parameters:

Name	Type	Description	Default
`datasets`	`list`	A list of dataset file names. These will be aggregated into a single dataframe.	required

Returns:

Type	Description
`DataFrame`	A single dataframe comprising the aggregated datasets.

Source code in main.py

def aggregate_datasets(datasets: list) -> pd.DataFrame:
    """This method aggregates the specified dataset list into a single dataframe for further use.

    Note, this method can be used if the user requires the images to be matched to a dataset.
    In the current format, the labelling process only requires the images. This method is in place to offer the
    capability to extend the labelling process if required.

    Args:
        datasets (list): A list of dataset file names. These will be aggregated into a single dataframe.

    Returns:
        (DataFrame): A single dataframe comprising the aggregated datasets.
    """
    df = pd.DataFrame()
    for dataset in datasets:  # Iterate through the datasets
        current_df = pd.read_csv(observation_path + dataset, index_col=0)  # Read in the currrent dataset as a dataframe

        current_df = current_df[current_df['taxon_species_name'] != 'Felis catus']  # Apply known Felis catus restriction
        df = pd.concat([df, current_df], sort=False)  # Concatenate the dataframes
    return df

`avoid_duplicate_images(filenames)`

This method removes images from filenames, that have already been processed to avoid repeated work.

Parameters:

Name	Type	Description	Default
`filenames`	`list`	The list of all image filenames to be labelled.	required

Returns:

Type	Description
`list`	A list of filenames, with those already labelled removed.

Source code in main.py

def avoid_duplicate_images(filenames: list):
    """This method removes images from filenames, that have already been processed to avoid repeated work.

    Args:
        filenames (list): The list of all image filenames to be labelled.

    Returns:
        (list): A list of filenames, with those already labelled removed.
    """
    if os.path.exists(labelled_path + labelled_file):
        df_labelled = pd.read_csv(labelled_path + labelled_file)  # Read in the labelled dataset
        labelled_images = df_labelled['id'].tolist()  # Generate a list of id's in the labelled dataset
        filenames = filter(lambda i: i not in labelled_images, filenames)  # Filter out already labelled images
        update_binary_counts(df_labelled)  # Update the positive and negative counts
    else:
        with open(labelled_path + labelled_file, 'w') as file:  # Create an empty file if it doesn't exist
            file.write("id,label\n")

    return filenames

`copy_to_labelled_images(filename, label)`

This method copies the labelled image to the corresponding directory within data/labelled/images/

Note, the binary directories will be created automatically based on the categorical names provided in the code. In the end, the data/labelled/images/ directory will contain two additional directories housing images of each class.

Parameters:

Name	Type	Description	Default
`filename`	`str`	The name of the file to be copied into the labelled images directory.	required
`label`	`str`	The corresponding categorical label of the image.	required

Source code in main.py

def copy_to_labelled_images(filename: str, label: str):
    """This method copies the labelled image to the corresponding directory within `data/labelled/images/`

    Note, the binary directories will be created automatically based on the categorical names provided in the code.
    In the end, the `data/labelled/images/` directory will contain two additional directories housing images of each class.

    Args:
        filename (str): The name of the file to be copied into the labelled images directory.
        label (str): The corresponding categorical label of the image.
    """
    file_origin = image_path + filename
    file_destination_directory = labelled_image_path + label + "/"
    file_destination = file_destination_directory + filename

    if not os.path.exists(file_destination_directory):  # If the directory doesn't exist, then make the directory
        os.makedirs(file_destination_directory)

    shutil.copy(file_origin, file_destination)  # Copy the file from the origin directory into the newly specified directory.

`display_image(filename)`

This method displays the image specified by the filename.

The filename of the image is assumed to be located in the data/images/ directory. The image display will close upon the click of the button to label the image.

Parameters:

Name	Type	Description	Default
`filename`	`str`	The file name of the image to be displayed.	required

Returns:

Type	Description
`int`	An integer encoding of the key pressed.

Source code in main.py

def display_image(filename: str):
    """This method displays the image specified by the filename.

    The filename of the image is assumed to be located in the `data/images/` directory.
    The image display will close upon the click of the button to label the image.

    Args:
        filename (str): The file name of the image to be displayed.

    Returns:
        (int): An integer encoding of the key pressed.
    """
    img = cv2.imread(image_path + filename)
    cv2.imshow('Current image', img)
    key_pressed = cv2.waitKey(0)
    return key_pressed

`labelling_process()`

This method controls the image labelling process.

In summary, the process is as follows: The data/images/ directory holds all of the images to be labelled. A check is conducted to ensure no images are repeatedly labelled. Each image is labelled, the image is copied into a corresponding directory. A history of each image filename and its corresponding label is maintained.

As a result, there exists a labelled_file.csv containing the labelling history. Additionally, within the data/labelled/images/ directory there exist two directories housing the binary labelled images.

Source code in main.py

def labelling_process():
    """This method controls the image labelling process.

    In summary, the process is as follows:
    The `data/images/` directory holds all of the images to be labelled.
    A check is conducted to ensure no images are repeatedly labelled.
    Each image is labelled, the image is copied into a corresponding directory.
    A history of each image filename and its corresponding label is maintained.

    As a result, there exists a `labelled_file.csv` containing the labelling history.
    Additionally, within the `data/labelled/images/` directory there exist two directories housing the binary labelled images.
    """
    filenames = os.listdir(image_path)  # Gather filenames from image directory. It is assumed the filenames are the ID's
    filenames = avoid_duplicate_images(filenames)  # avoid already labelled images

    labelled_files = []  # Storing the labelled files
    labels = []  # Storing the labels

    for filename in filenames:
        encoded_key = display_image(filename)

        try:
            label = binary_labels[encoded_key]  # Decode the label

            if label != ignore_class and label != 'Ignore':  # If the label isn's specified as the ignore class and its not the Ignore label
                labels.append(label)
                labelled_files.append(filename)

                status_update(encoded_key)  # Binary count update
                copy_to_labelled_images(filename, label)  # Copy image to data/labelled/images

        except:
            write_to_file(labelled_files, labels)  # Write the labelling history on exit.
            sys.exit()

`remove_already_processed_observations(df)`

This method removes the already labelled observations from the dataset.

The method accesses the already labelled dataset and extracts the unique observation IDs. It removes the IDs if they are present in the current dataset to avoid repetition.

Additionally, the method updates the positive and negative counts to keep track of the number of each binary label in the labelled dataset.

Note, this method can be used if the user requires the images to be matched to a dataset. In the current format, the labelling process only requires the images. This method is in place to offer the capability to extend the labelling process if required.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The current dataset to still be labelled.	required

Returns:

Type	Description
`DataFrame`	The dataframe is returned with already labelled observations removed from it.

Source code in main.py

def remove_already_processed_observations(df: pd.DataFrame):
    """This method removes the already labelled observations from the dataset.

    The method accesses the already labelled dataset and extracts the unique observation IDs.
    It removes the IDs if they are present in the current dataset to avoid repetition.

    Additionally, the method updates the positive and negative counts to keep track of the number of each binary label in the
    labelled dataset.

    Note, this method can be used if the user requires the images to be matched to a dataset.
    In the current format, the labelling process only requires the images. This method is in place to offer the
    capability to extend the labelling process if required.

    Args:
        df (DataFrame): The current dataset to still be labelled.

    Returns:
        (DataFrame): The dataframe is returned with already labelled observations removed from it.
    """

    if os.path.exists(labelled_path + labelled_file):
        df_labelled = pd.read_csv(labelled_path + labelled_file)  # Read in the labelled dataset
        labelled_ids = df_labelled['id'].tolist()  # Generate a list of id's in the labelled dataset
        df = df.drop(labelled_ids)  # From the current dataset, drop the rows with the same id (id is the index).

        update_binary_counts(df_labelled)  # Update the binary counts
    else:
        with open(labelled_path + labelled_file, 'w') as file:  # Create an empty file if it doesn't exist
            file.write("id,label\n")

    return df

`status_update(encoded_key)`

This method updates the binary counts and displays the current counts to the terminal

Parameters:

Name	Type	Description	Default
`encoded_key`	`int`	The encode key value (numerical representation of the key pressed)	required

Source code in main.py

def status_update(encoded_key: int):
    """This method updates the binary counts and displays the current counts to the terminal

    Args:
        encoded_key (int): The encode key value (numerical representation of the key pressed)
    """
    global positive_count, negative_count

    if encoded_key == 49:  # Positive encoding
        positive_count += 1
    elif encoded_key == 48:  # Negative encoding
        negative_count += 1

    print(binary_labels[49] + ' count: ' + str(positive_count) + ', ' +
          binary_labels[48] + ' count: ' + str(negative_count))

`update_binary_counts(df_labelled)`

This method updates the binary counts based on the already labelled data.

This method updates the global binary counts of the file.

Parameters:

Name	Type	Description	Default
`df_labelled`	`DataFrame`	The dataframe containing the already labelled observations.	required

Source code in main.py

def update_binary_counts(df_labelled: pd.DataFrame):
    """This method updates the binary counts based on the already labelled data.

    This method updates the global binary counts of the file.

    Args:
        df_labelled (DataFrame): The dataframe containing the already labelled observations.
    """
    global positive_count, negative_count

    if not df_labelled.empty:  # Update the binary label counts if the file is not empty
        counts = df_labelled['label'].value_counts().to_dict()  # Convert counts to a dictionary

        for label in counts.keys():  # Label matching to update counts
            if label == binary_labels[49]:
                positive_count = counts[label]
            elif label == binary_labels[48]:
                negative_count = counts[label]

`write_to_file(filenames, labels)`

This method writes the labelled files and their corresponding labels to the labelled_file

Parameters:

Name	Type	Description	Default
`filenames`	`list`	A list of filenames with the corresponding labels in the same order as the labels list.	required
`labels`	`list`	The categorical labels of the images.	required

Source code in main.py

def write_to_file(filenames: list, labels):
    """This method writes the labelled files and their corresponding labels to the `labelled_file`

    Args:
        filenames (list): A list of filenames with the corresponding labels in the same order as the labels list.
        labels (list): The categorical labels of the images.
    """
    results_df = pd.DataFrame({'id': filenames, 'label': labels})
    results_df.to_csv(labelled_path + labelled_file, mode='a', index=False, header=False)