diff --git a/docs/source/User-guide/Annotate.rst b/docs/source/User-guide/Annotate.rst index 30deb39c..3d29211a 100644 --- a/docs/source/User-guide/Annotate.rst +++ b/docs/source/User-guide/Annotate.rst @@ -5,168 +5,203 @@ Annotate MapReader's ``Annotate`` subpackage is used to interactively annotate images (e.g. maps). -This is done in three simple steps: - -1. :ref:`Create_file` -2. :ref:`Annotate_images` -3. :ref:`Save_annotations` +.. _Annotate_images: -.. _Create_file: +Annotate your images +---------------------- -Create an annotation tasks file ------------------------------------ +.. note:: Run these commands in a Jupyter notebook (or other IDE), ensuring you are in your `mr_py38` python environment. -To set up your annotation tasks, you will need to create a separate ``annotation_tasks.yaml`` file. -An example file which can be used as a template can be found in ``MapReader/worked_examples/``. -.. todo:: Note that you can do this via text editor in windows or something like ??? in mac/linux +To prepare your annotations, you must specify a number of parameters when initializing the Annotator class. +We will use a 'rail_space' annotation task to demonstrate how to set up the annotator. -Your ``annotation_tasks.yaml`` file needs to contain two sections: ``tasks`` and ``paths``. +The simplest way to initialize your annotator is to provide file paths for your patches and parent images using the ``patch_paths`` and ``parent_paths`` arguments, respectively. +e.g. : -The ``tasks`` section is used to specify annotation tasks and their labels. -This section can contain as many tasks/labels as you would like and should be formatted as follows: +.. code-block:: python -.. code-block:: yaml + from mapreader import Annotator + + # EXAMPLE + annotator = Annotator( + patch_paths="./patches_100_pixel/*.png", + parent_paths="./maps/*.png", + metadata="./maps/metadata.csv", + annotations_dir="./annotations", + task_name="railspace", + labels=["no_rail_space", "rail_space"], + username="rosie", + ) - tasks: - your_task_name: - labels: ["your_label_1", "your_label_2", "your_label_3"] - your_task_name_2: - labels: ["your_label_1", "your_label_2"] +Alternatively, if you have created/saved a ``patch_df`` and ``parent_df`` from MapReader's Load subpackage, you can replace the ``patch_paths`` and ``parent_paths`` arguments with ``patch_df`` and ``parent_df`` arguments, respectively. +e.g. : -.. note:: When annotating, for each patch you will only be able to select one label from your label list. So, if you envisage wanting to label something as "label_1" **and also** "label_2", you will need to create a separate label combining "label_1 and label_2". +.. code-block:: python -The ``paths`` section is used to specify file paths to sets of images you would like to annotate (annotation sets). -This section can contain as many annotation sets as you would like and should be formatted as follows: + from mapreader import Annotator -.. code-block:: yaml + # EXAMPLE + annotator = Annotator( + patch_df="./patch_df.csv", + parent_df="./parent_df.csv", + annotations_dir="./annotations", + task_name="railspace", + labels=["no_rail_space", "rail_space"], + username="rosie", + ) - paths: - your_annotation_set: - patch_paths: "./path/to/patches/" - parent_paths: "./path/to/parents/" - annot_dir: "./path/to/save/annotations" - your_annotation_set_2: - patch_paths: "./path/to/patches_2/" - parent_paths: "./path/to/parents_2/" - annot_dir: "./path/to/save/annotations_2" +.. note:: You can pass either a pandas DataFrame or the path to a csv file as the ``patch_df`` and ``parent_df`` arguments. -For example, if you want to annotate 'rail_space' (as in `this paper `_) and have been using the recommended/default directory structure, your ``annotation_tasks.yaml`` should look like this: +In the above examples, the following parameters are also specified: -.. code-block:: yaml +- ``annotations_dir``: The directory where your annotations will be saved (e.g., ``"./annotations"``). +- ``task_name``: The specific annotation task you want to perform, in this case ``"railspace"``. +- ``labels``: A list of labels for the annotation task, such as ``"no_rail_space"`` and ``"rail_space"``. +- ``username``: Your unique identifier, which can be any string (e.g., ``"rosie"``). - #EXAMPLE - tasks: - rail_space: - labels: ["no_rail_space", "rail_space"] +Other arguments that you may want to be aware of when initializing the ``Annotator`` instance include: - paths: - set_001: - patch_paths: "./patches/patch-*png" - parent_paths: "./maps/*png" - annot_dir: "./annotations_one_inch" +- ``show_context``: Whether to show a context image in the annotation interface (default: ``False``). +- ``surrounding``: How many surrounding patches to show in the context image (default: ``1``). +- ``sortby``: The name of the column to use to sort the patch Dataframe (e.g. "mean_pixel_R" to sort by red pixel intensities). +- ``delimiter``: The delimiter to use when reading your data files (default: ``","`` for csv). -.. _Annotate_images: +After setting up the ``Annotator`` instance, you can interactively annotate a sample of your images using: -Annotate your images ----------------------- +.. code-block:: python -.. note:: Run these commands in a Jupyter notebook (or other IDE), ensuring you are in your `mr_py38` python environment. + annotator.annotate() -To prepare your annotations, you must specify a ``userID``, ``annotation_tasks_file`` (i.e. the ``annotation_task.yaml``), tell MapReader which ``task`` you'd like to run and which ``annotation_set`` you would like to run on. +Patch size +~~~~~~~~~~ -.. todo:: Give big list of different options here -.. todo:: Explain that things don't autosave +By default, your patches will be shown to you as their original size in pixels. +This can make annotating difficult if your patches are very small. +To resize your patches when viewing them in the annotation interface, you can pass the ``resize_to`` keyword argument when initializing the ``Annotator`` instance or when calling the ``annotate()`` method. -e.g. following our 'rail_space' example from earlier: +e.g. to resize your patches so that their largest edge is 300 pixels: .. code-block:: python - #EXAMPLE - from mapreader.annotate.utils import prepare_annotation - - annotation = prepare_annotation( - userID="rosie", - annotation_tasks_file="annotation_tasks.yaml", - task="rail_space", - annotation_set="set_001", + # EXAMPLE + annotator = Annotator( + patch_df="./patch_df.csv", + parent_df="./parent_df.csv", + annotations_dir="./annotations", + task_name="railspace", + labels=["no_rail_space", "rail_space"], + username="rosie", + resize_to=300, ) -You can then interactively annotate a sample of your images using: +Or, equivalently, : .. code-block:: python - annotation + annotator.annotate(resize_to=300) + +.. note:: Passing the ``resize_to`` argument when calling the ``annotate()`` method overrides the ``resize_to`` argument passed when initializing the ``Annotator`` instance. -.. image:: ../figures/annotate.png - :width: 400px +Context +~~~~~~~ -To help with annotating, you can set the annotation interface to show a context image using ``context_image=True``. -This creates a second panel in the annotation interface, showing your patch in the context of a larger region whose size, in pixels, is set by ``xoffset`` and ``yoffset``. +As well as resizing your patches, you can also set the annotation interface to show a context image using ``show_context=True``. +This creates a panel of patches in the annotation interface, highlighting your patch in the middle of its surrounding immediate images. +As above, you can either pass the ``show_context`` argument when initializing the ``Annotator`` instance or when calling the ``annotate`` method. e.g. : .. code-block:: python - #EXAMPLE - annotation=prepare_annotation( - userID="rosie", - annotation_tasks_file="annotation_tasks.yaml", - task="rail_space", - annotation_set="set_001", - context_image=True, - xoffset=100, - yoffset=100) + # EXAMPLE + annotator = Annotator( + patch_df="./patch_df.csv", + parent_df="./parent_df.csv", + annotations_dir="./annotations", + task_name="railspace", + labels=["no_rail_space", "rail_space"], + username="rosie", + show_context=True, + ) + + annotator.annotate() + +Or, equivalently, : + +.. code-block:: python - annotation + annotator.annotate(show_context=True) -.. image:: ../figures/annotate_context.png - :width: 400px +.. note:: Passing the ``show_context`` argument when calling the ``annotate()`` method overrides the ``show_context`` argument passed when initializing the ``Annotator`` instance. + +By default, your ``Annotator`` will show one surrounding patch in the context image. +You can change this by passing the ``surrounding`` argument when initializing the ``Annotator`` instance and/or when calling the ``annotate`` method. + +e.g. to show two surrounding patches in the context image: + +.. code-block:: python -By default, your patches will be shown to you in a random order but, to help with annotating, can be sorted by their mean pixel intensities using ``sortby="mean"``. + annotator.annotate(show_context=True, surrounding=2) -You can also specify ``min_mean_pixel`` and ``max_mean_pixel`` to limit the range of mean pixel intensities shown to you and ``min_std_pixel`` and ``max_std_pixel`` to limit the range of standard deviations within the mean pixel intensities shown to you. -This is particularly useful if your images (e.g. maps) have collars or margins that you would like to avoid. +Sort order +~~~~~~~~~~ +By default, your patches will be shown to you in a random order but, to help with annotating, they can be sorted using the ``sortby`` argument. +This argument takes the name of a column in your patch DataFrame and sorts the patches by the values in that column. e.g. : .. code-block:: python - annotation=prepare_annotation( - userID="rosie", - annotation_tasks_file="annotation_tasks.yaml", - task="rail_space", - annotation_set="set_001", - context_image=True, - xoffset=100, - yoffset=100, - min_mean_pixel=0.5, - max_mean_pixel=0.9 + # EXAMPLE + annotator = Annotator( + patch_df="./patch_df.csv", + parent_df="./parent_df.csv", + annotations_dir="./annotations"m + task_name="railspace", + labels=["no_rail_space", "rail_space"], + username="rosie", + sortby="mean_pixel_R", ) - annotation +This will sort your patches by the mean red pixel intensity in each patch, by default, in ascending order. +This is particularly useful if your images (e.g. maps) have collars, margins or blank regions that you would like to avoid. + +.. note:: If you would like to sort in descending order, you can also pass ``ascending=False``. + +You can also specify ``min_values`` and ``max_values`` to limit the range of values shown to you. +e.g. To sort your patches by the mean red pixel intensity in each patch but only show you patches with a mean blue pixel intensity between 0.5 and 0.9. + +.. code-block:: python + + # EXAMPLE + annotator = Annotator( + patch_df="./patch_df.csv", + parent_df="./parent_df.csv", + annotations_dir="./annotations", + task_name="railspace", + labels=["no_rail_space", "rail_space"], + username="rosie", + sortby="mean_pixel_R", + min_values={"mean_pixel_B": 0.5}, + max_values={"mean_pixel_B": 0.9}, + ) .. _Save_annotations: Save your annotations ---------------------- -Once you have annotated your images, you should save your annotations using: +Your annotations are automatically saved as you're making progress through the annotation task as a ``csv`` file (unless you've set the ``auto_save`` keyword argument to ``False`` when you set up the ``Annotator`` instance). -.. code-block:: python +If you need to know the name of the annotations file, you may refer to a property on your ``Annotator`` instance: - #EXAMPLE - from mapreader.annotate.utils import save_annotation +.. code-block:: python - save_annotation( - annotation, - userID="rosie", - task="rail_space", - annotation_tasks_file="annotation_tasks.yaml", - annotation_set="set_001", - ) + annotator.annotations_file -This saves your annotations as a ``csv`` file in the ``annot_dir`` specified in your annotation tasks file. +The file will be located in the ``annotations_dir`` that you may have passed as a keyword argument when you set up the ``Annotator`` instance. +If you didn't provide a keyword argument, it will be in the ``./annotations`` directory. For example, if you have downloaded your maps using the default settings of our ``Download`` subpackage or have set up your directory as recommended in our `Input Guidance `__, and then saved your patches using the default settings: @@ -185,5 +220,5 @@ For example, if you have downloaded your maps using the default settings of our │ ├── patch-100-200-#map1.png#.png │ ├── patch-200-300-#map1.png#.png │ └── ... - └──annotations_one_inch - └──rail_space_#rosie#.csv + └──annotations + └──rail_space_#rosie#-123hjkfr298jIUHfs808da.csv diff --git a/mapreader/__init__.py b/mapreader/__init__.py index 98d02cda..25d63b7a 100644 --- a/mapreader/__init__.py +++ b/mapreader/__init__.py @@ -15,6 +15,8 @@ from mapreader.process import process +from mapreader.annotate.annotator import Annotator + from . import _version __version__ = _version.get_versions()["version"] diff --git a/mapreader/annotate/annotator.py b/mapreader/annotate/annotator.py new file mode 100644 index 00000000..88503581 --- /dev/null +++ b/mapreader/annotate/annotator.py @@ -0,0 +1,924 @@ +from __future__ import annotations + +import functools +import hashlib +import json +import os +import random +import string +import warnings +from ast import literal_eval +from itertools import product +from pathlib import Path + +import ipywidgets as widgets +import numpy as np +import pandas as pd +from IPython.display import clear_output, display +from numpy import array_split +from PIL import Image, ImageOps + +from ..load.loader import load_patches + +warnings.filterwarnings("ignore", category=UserWarning) + +MAX_SIZE = 1000 + +_CENTER_LAYOUT = widgets.Layout( + display="flex", flex_flow="column", align_items="center" +) + + +class Annotator(pd.DataFrame): + """ + Annotator class for annotating patches with labels. + + Parameters + ---------- + patch_df : str or pd.DataFrame or None, optional + Path to a CSV file or a pandas DataFrame containing patch data, by default None + parent_df : str or pd.DataFrame or None, optional + Path to a CSV file or a pandas DataFrame containing parent data, by default None + labels : list, optional + List of labels for annotation, by default None + patch_paths : str or None, optional + Path to patch images, by default None + Ignored if patch_df is provided. + parent_paths : str or None, optional + Path to parent images, by default None + Ignored if parent_df is provided. + metadata_path : str or None, optional + Path to metadata CSV file, by default None + annotations_dir : str, optional + Directory to store annotations, by default "./annotations" + patch_paths_col : str, optional + Name of the column in which image paths are stored in patch DataFrame, by default "image_path" + label_col : str, optional + Name of the column in which labels are stored in patch DataFrame, by default "label" + show_context : bool, optional + Whether to show context when loading patches, by default False + auto_save : bool, optional + Whether to automatically save annotations, by default True + delimiter : str, optional + Delimiter used in CSV files, by default "," + sortby : str or None, optional + Name of the column to use to sort the patch DataFrame, by default None. + Default sort order is ``ascending=True``. Pass ``ascending=False`` keyword argument to sort in descending order. + **kwargs + Additional keyword arguments + + Raises + ------ + FileNotFoundError + If the provided patch_df or parent_df file path does not exist + ValueError + If patch_df or parent_df is not a valid path to a CSV file or a pandas DataFrame + If patch_df or patch_paths is not provided + If the DataFrame does not have the required columns + If sortby is not a string or None + If labels provided are not in the form of a list + SyntaxError + If labels provided are not in the form of a list + + Notes + ----- + + Additional kwargs: + + - ``username``: Username to use when saving annotations file. Default: Randomly generated string. + - ``task_name``: Name of the annotation task. Default: "task". + - ``min_values``: A dictionary consisting of column names (keys) and minimum values as floating point values (values). Default: {}. + - ``max_values``: A dictionary consisting of column names (keys) and maximum values as floating point values (values). Default: {}. + - ``buttons_per_row``: Number of buttons to display per row. Default: None. + - ``ascending``: Whether to sort the DataFrame in ascending order. Default: True. + - ``surrounding``: The number of surrounding images to show for context. Default: 1. + - ``max_size``: The size in pixels for the longest side to which constrain each patch image. Default: 1000. + - ``resize_to``: The size in pixels for the longest side to which resize each patch image. Default: None. + """ + + def __init__( + self, + patch_df: str | pd.DataFrame | None = None, + parent_df: str | pd.DataFrame | None = None, + labels: list = None, + patch_paths: str | None = None, + parent_paths: str | None = None, + metadata_path: str | None = None, + annotations_dir: str = "./annotations", + patch_paths_col: str = "image_path", + label_col: str = "label", + show_context: bool = False, + auto_save: bool = True, + delimiter: str = ",", + sortby: str | None = None, + **kwargs, + ): + if labels is None: + labels = [] + if patch_df is not None: + if isinstance(patch_df, str): + if os.path.exists(patch_df): + patch_df = pd.read_csv( + patch_df, + index_col=0, + sep=delimiter, + ) + else: + raise FileNotFoundError(f"[ERROR] Could not find {patch_df}.") + if not isinstance(patch_df, pd.DataFrame): + raise ValueError( + "[ERROR] ``patch_df`` must be a path to a csv or a pandas DataFrame." + ) + self._eval_df(patch_df) # eval tuples/lists in df + + if parent_df is not None: + if isinstance(parent_df, str): + if os.path.exists(parent_df): + parent_df = pd.read_csv( + parent_df, + index_col=0, + sep=delimiter, + ) + else: + raise FileNotFoundError(f"[ERROR] Could not find {parent_df}.") + if not isinstance(parent_df, pd.DataFrame): + raise ValueError( + "[ERROR] ``parent_df`` must be a path to a csv or a pandas DataFrame." + ) + self._eval_df(parent_df) # eval tuples/lists in df + + if patch_df is None: + # If we don't get patch data provided, we'll use the patches and parents to create the dataframes + if patch_paths: + parent_paths_df, patch_df = self._load_dataframes( + patch_paths=patch_paths, + parent_paths=parent_paths, + metadata_path=metadata_path, + delimiter=delimiter, + ) + + # only take this dataframe if parent_df is None + if parent_df is None: + parent_df = parent_paths_df + else: + raise ValueError( + "[ERROR] Please specify one of ``patch_df`` or ``patch_paths``." + ) + + # Check for metadata + data + if not isinstance(patch_df, pd.DataFrame): + raise ValueError("[ERROR] No patch data available.") + if not isinstance(parent_df, pd.DataFrame): + raise ValueError("[ERROR] No metadata (parent data) available.") + + # Check for url column and add to patch dataframe + if "url" in parent_df.columns: + patch_df = patch_df.join(parent_df["url"], on="parent_id") + else: + raise ValueError( + "[ERROR] Metadata (parent data) should contain a 'url' column." + ) + + # Add label column if not present + if label_col not in patch_df.columns: + patch_df[label_col] = None + patch_df["changed"] = False + + # Check for image paths column + if patch_paths_col not in patch_df.columns: + raise ValueError( + f"[ERROR] Your DataFrame does not have the image paths column: {patch_paths_col}." + ) + + image_list = json.dumps( + sorted(patch_df[patch_paths_col].to_list()), sort_keys=True + ) + + # Set up annotations file + username = kwargs.get( + "username", + "".join( + [random.choice(string.ascii_letters + string.digits) for n in range(30)] + ), + ) + task_name = kwargs.get("task_name", "task") + id = hashlib.md5(image_list.encode("utf-8")).hexdigest() + + annotations_file = task_name.replace(" ", "_") + f"_#{username}#-{id}.csv" + annotations_file = os.path.join(annotations_dir, annotations_file) + + # Ensure labels are of type list + if not isinstance(labels, list): + raise SyntaxError("[ERROR] Labels provided must be as a list") + + # Ensure unique values in list + labels = sorted(set(labels), key=labels.index) + + # Test for existing file + if os.path.exists(annotations_file): + print(f"[INFO] Loading existing annotations for {username}.") + existing_annotations = pd.read_csv( + annotations_file, index_col=0, sep=delimiter + ) + + if label_col not in existing_annotations.columns: + raise ValueError( + f"[ERROR] Your existing annotations do not have the label column: {label_col}." + ) + + print(existing_annotations[label_col].dtype) + + if existing_annotations[label_col].dtype == int: + # convert label indices (ints) to labels (strings) + # this is to convert old annotations format to new annotations format + existing_annotations[label_col] = existing_annotations[label_col].apply( + lambda x: labels[x] + ) + + patch_df = patch_df.join( + existing_annotations, how="left", lsuffix="_x", rsuffix="_y" + ) + patch_df[label_col] = patch_df["label_y"].fillna(patch_df[f"{label_col}_x"]) + patch_df = patch_df.drop( + columns=[ + f"{label_col}_x", + f"{label_col}_y", + ] + ) + patch_df["changed"] = patch_df[label_col].apply( + lambda x: True if x else False + ) + + patch_df[patch_paths_col] = patch_df[f"{patch_paths_col}_x"] + patch_df = patch_df.drop( + columns=[ + f"{patch_paths_col}_x", + f"{patch_paths_col}_y", + ] + ) + + # initiate as a DataFrame + super().__init__(patch_df) + + ## pixel_bounds = x0, y0, x1, y1 + self["min_x"] = self.pixel_bounds.apply(lambda x: x[0]) + self["min_y"] = self.pixel_bounds.apply(lambda x: x[1]) + self["max_x"] = self.pixel_bounds.apply(lambda x: x[2]) + self["max_y"] = self.pixel_bounds.apply(lambda x: x[3]) + + # Sort by sortby column if provided + if isinstance(sortby, str): + if sortby in self.columns: + self.sort_values( + sortby, ascending=kwargs.get("ascending", True), inplace=True + ) + else: + raise ValueError(f"[ERROR] {sortby} is not a column in the DataFrame.") + elif sortby is not None: + raise ValueError("[ERROR] ``sortby`` must be a string or None.") + + self._labels = labels + self.label_col = label_col + self.patch_paths_col = patch_paths_col + self.annotations_file = annotations_file + self.show_context = show_context + self.auto_save = auto_save + self.username = username + self.task_name = task_name + + # set up for the annotator + self.buttons_per_row = kwargs.get("buttons_per_row", None) + self._min_values = kwargs.get("min_values", {}) + self._max_values = kwargs.get("max_values", {}) # pixel_bounds = x0, y0, x1, y1 + + self.patch_width, self.patch_height = self.get_patch_size() + + # Create annotations_dir + Path(annotations_dir).mkdir(parents=True, exist_ok=True) + + # Set up standards for context display + self.surrounding = kwargs.get("surrounding", 1) + self.max_size = kwargs.get("max_size", MAX_SIZE) + self.resize_to = kwargs.get("resize_to", None) + + # set up buttons + self._buttons = [] + + # Set max buttons + if not self.buttons_per_row: + if (len(self._labels) % 2) == 0: + if len(self._labels) > 4: + self.buttons_per_row = 4 + else: + self.buttons_per_row = 2 + else: + if len(self._labels) == 3: + self.buttons_per_row = 3 + else: + self.buttons_per_row = 5 + + # Set indices + self.current_index = -1 + self.previous_index = 0 + + # Setup buttons + self._setup_buttons() + + # Setup box for buttons + self._setup_box() + + # Setup queue + self._queue = self.get_queue() + + @staticmethod + def _load_dataframes( + patch_paths: str | None = None, + parent_paths: str | None = None, + metadata_path: str | None = None, + delimiter: str = ",", + ) -> tuple[pd.DataFrame, pd.DataFrame]: + """ + Load parent and patch dataframes by loading images from file paths. + + Parameters + ---------- + patch_paths : str | None, optional + Path to the patches, by default None + parent_paths : str | None, optional + Path to the parent images, by default None + metadata_path : str | None, optional + Path to the parent metadata file, by default None + delimiter : str, optional + Delimiter used in CSV files, by default "," + + Returns + ------- + tuple[pd.DataFrame, pd.DataFrame] + A tuple containing the parent dataframe and patch dataframe. + """ + if patch_paths: + print(f"[INFO] Loading patches from {patch_paths}.") + if parent_paths: + print(f"[INFO] Loading parents from {parent_paths}.") + + maps = load_patches(patch_paths=patch_paths, parent_paths=parent_paths) + # Add pixel stats + maps.calc_pixel_stats() + + try: + maps.add_metadata(metadata_path, delimiter=delimiter) + print(f"[INFO] Adding metadata from {metadata_path}.") + except ValueError: + raise FileNotFoundError( + f"[INFO] Metadata file at {metadata_path} not found. Please specify the correct file path using the ``metadata_path`` argument." + ) + + parent_df, patch_df = maps.convert_images() + + return parent_df, patch_df + + def _eval_df(self, df): + for col in df.columns: + try: + df[col] = df[col].apply(literal_eval) + except (ValueError, TypeError, SyntaxError): + pass + + def get_patch_size(self): + """ + Calculate and return the width and height of the patches based on the + first patch of the DataFrame, assuming the same shape of patches + across the frame. + + Returns + ------- + Tuple[int, int] + Width and height of the patches. + """ + patch_width = ( + self.sort_values("min_x").max_x[0] - self.sort_values("min_x").min_x[0] + ) + patch_height = ( + self.sort_values("min_y").max_y[0] - self.sort_values("min_y").min_y[0] + ) + + return patch_width, patch_height + + def _setup_buttons(self) -> None: + """ + Set up buttons for each label to be annotated. + """ + for label in self._labels: + btn = widgets.Button( + description=label, + button_style="info", + layout=widgets.Layout(flex="1 1 0%", width="auto"), + ) + btn.style.button_color = "#9B6F98" + + def on_click(lbl, *_, **__): + self._add_annotation(lbl) + + btn.on_click(functools.partial(on_click, label)) + self._buttons.append(btn) + + def _setup_box(self) -> None: + """ + Set up the box which holds all the buttons. + """ + if len(self._buttons) > self.buttons_per_row: + self.box = widgets.VBox( + [ + widgets.HBox(self._buttons[x : x + self.buttons_per_row]) + for x in range(0, len(self._buttons), self.buttons_per_row) + ] + ) + else: + self.box = widgets.HBox(self._buttons) + + # back button + prev_btn = widgets.Button( + description="« previous", layout=widgets.Layout(flex="1 1 0%", width="auto") + ) + prev_btn.on_click(self._prev_example) + + # next button + next_btn = widgets.Button( + description="next »", layout=widgets.Layout(flex="1 1 0%", width="auto") + ) + next_btn.on_click(self._next_example) + + self.navbox = widgets.VBox([widgets.HBox([prev_btn, next_btn])]) + + def get_queue( + self, as_type: str | None = "list" + ) -> list[int] | (pd.Index | pd.Series): + """ + Gets the indices of rows which are legible for annotation. + + Parameters + ---------- + as_type : str, optional + The format in which to return the indices. Options: "list", + "index". Default is "list". If any other value is provided, it + returns a pandas.Series. + + Returns + ------- + List[int] or pandas.Index or pandas.Series + Depending on "as_type", returns either a list of indices, a + pd.Index object, or a pd.Series of legible rows. + """ + + def check_legibility(row): + if row.label is not None: + return False + + test = [ + row[col] >= min_value for col, min_value in self._min_values.items() + ] + [row[col] <= max_value for col, max_value in self._max_values.items()] + + if not all(test): + return False + + return True + + test = self.copy() + test["eligible"] = test.apply(check_legibility, axis=1) + test = test[ + ["eligible"] + [col for col in test.columns if not col == "eligible"] + ] + + indices = test[test.eligible].index + if as_type == "list": + return list(indices) + if as_type == "index": + return indices + return test[test.eligible] + + def get_context(self): + """ + Provides the surrounding context for the patch to be annotated. + + Returns + ------- + ipywidgets.VBox + An IPython VBox widget containing the surrounding patches for + context. + """ + + def get_path(image_path, dim=True): + # Resize the image + im = Image.open(image_path) + + # Dim the image + if dim is True or dim == "True": + im_array = np.array(im) + im_array = 256 - (256 - im_array) * 0.4 # lighten image + im = Image.fromarray(im_array.astype(np.uint8)) + return im + + def get_empty_square(): + im = Image.new( + size=(self.patch_width, self.patch_height), + mode="RGB", + color="white", + ) + return im + + if self.surrounding > 3: + display( + widgets.HTML( + """

Warning: More than 3 surrounding tiles may crowd the display and not display correctly.

""" + ) + ) + + ix = self._queue[self.current_index] + + x = self.at[ix, "min_x"] + y = self.at[ix, "min_y"] + current_parent = self.at[ix, "parent_id"] + + parent_frame = self.query(f"parent_id=='{current_parent}'") + + deltas = list(range(-self.surrounding, self.surrounding + 1)) + y_and_x = list( + product( + [y + y_delta * self.patch_height for y_delta in deltas], + [x + x_delta * self.patch_width for x_delta in deltas], + ) + ) + queries = [f"min_x == {x} & min_y == {y}" for y, x in y_and_x] + items = [parent_frame.query(query) for query in queries] + + # derive ids from items + ids = [x.index[0] if len(x.index) == 1 else None for x in items] + ids = [x != ix for x in ids] + + # derive images from items + image_paths = [ + x.at[x.index[0], "image_path"] if len(x.index) == 1 else None for x in items + ] + + # zip them + image_list = list(zip(image_paths, ids)) + + # split them into rows + per_row = len(deltas) + images = [ + [get_path(x[0], dim=x[1]) if x[0] else get_empty_square() for x in lst] + for lst in array_split(image_list, per_row) + ] + + total_width = (2 * self.surrounding + 1) * self.patch_width + total_height = (2 * self.surrounding + 1) * self.patch_height + + context_image = Image.new("RGB", (total_width, total_height)) + + y_offset = 0 + for row in images: + x_offset = 0 + for image in row: + context_image.paste(image, (x_offset, y_offset)) + x_offset += self.patch_width + y_offset += self.patch_height + + if self.resize_to is not None: + context_image = ImageOps.contain( + context_image, (self.resize_to, self.resize_to) + ) + # only constrain to max size if not resize_to + elif max(context_image.size) > self.max_size: + context_image = ImageOps.contain( + context_image, (self.max_size, self.max_size) + ) + + return context_image + + def annotate( + self, + show_context: bool | None = None, + min_values: dict | None = None, + max_values: dict | None = None, + surrounding: int | None = None, + resize_to: int | None = None, + max_size: int | None = None, + ) -> None: + """ + Renders the annotation interface for the first image. + + Parameters + ---------- + show_context : bool or None, optional + Whether or not to display the surrounding context for each image. + Default is None. + min_values : dict or None, optional + Minimum values for each property to filter images for annotation. + It should be provided as a dictionary consisting of column names + (keys) and minimum values as floating point values (values). + Default is None. + max_values : dict or None, optional + Maximum values for each property to filter images for annotation. + It should be provided as a dictionary consisting of column names + (keys) and minimum values as floating point values (values). + Default is None + surrounding : int or None, optional + The number of surrounding images to show for context. Default: 1. + max_size : int or None, optional + The size in pixels for the longest side to which constrain each + patch image. Default: 100. + + Returns + ------- + None + """ + if min_values is not None: + self._min_values = min_values + if max_values is not None: + self._max_values = max_values + + self.current_index = -1 + for button in self._buttons: + button.disabled = False + + if show_context is not None: + self.show_context = show_context + if surrounding is not None: + self.surrounding = surrounding + if resize_to is not None: + self.resize_to = resize_to + if max_size is not None: + self.max_size = max_size + + # re-set up queue + self._queue = self.get_queue() + + self.out = widgets.Output(layout=_CENTER_LAYOUT) + display(self.box) + display(self.navbox) + display(self.out) + + # self.get_current_index() + # TODO: Does not pick the correct NEXT... + self._next_example() + + def _next_example(self, *_) -> tuple[int, int, str]: + """ + Advances the annotation interface to the next image. + + Returns + ------- + Tuple[int, int, str] + Previous index, current index, and path of the current image. + """ + if not len(self._queue): + self.render_complete() + return + + if isinstance(self.current_index, type(None)) or self.current_index == -1: + self.current_index = 0 + else: + current_index = self.current_index + 1 + + try: + self._queue[current_index] + self.previous_index = self.current_index + self.current_index = current_index + except IndexError: + pass + + ix = self._queue[self.current_index] + + img_path = self.at[ix, self.patch_paths_col] + + self.render() + return self.previous_index, self.current_index, img_path + + def _prev_example(self, *_) -> tuple[int, int, str]: + """ + Moves the annotation interface to the previous image. + + Returns + ------- + Tuple[int, int, str] + Previous index, current index, and path of the current image. + """ + if not len(self._queue): + self.render_complete() + return + + current_index = self.current_index - 1 + + if current_index < 0: + current_index = 0 + + try: + self._queue[current_index] + self.previous_index = current_index - 1 + self.current_index = current_index + except IndexError: + pass + + ix = self._queue[self.current_index] + + img_path = self.at[ix, self.patch_paths_col] + + self.render() + return self.previous_index, self.current_index, img_path + + def render(self) -> None: + """ + Displays the image at the current index in the annotation interface. + + If the current index is greater than or equal to the length of the + dataframe, the method disables the "next" button and saves the data. + + Returns + ------- + None + """ + # Check whether we have reached the end + if self.current_index >= len(self) - 1: + self.render_complete() + return + + # ix = self.iloc[self.current_index].name + ix = self._queue[self.current_index] + + # render buttons + for button in self._buttons: + if button.description == "prev": + # disable previous button when at first example + button.disabled = self.current_index <= 0 + elif button.description == "next": + # disable skip button when at last example + button.disabled = self.current_index >= len(self) - 1 + elif button.description != "submit": + if self.at[ix, self.label_col] == button.description: + button.icon = "check" + else: + button.icon = "" + + # display new example + with self.out: + clear_output(wait=True) + image = self.get_patch_image(ix) + if self.show_context: + context = self.get_context() + self._context_image = context + display(context.convert("RGB")) + else: + display(image.convert("RGB")) + add_ins = [] + if self.at[ix, "url"]: + url = self.at[ix, "url"] + text = f'

Click to see entire map.

' + add_ins += [widgets.HTML(text)] + + value = self.current_index + 1 if self.current_index else 1 + description = f"{value} / {len(self._queue)}" + add_ins += [ + widgets.IntProgress( + value=value, + min=0, + max=len(self._queue), + step=1, + description=description, + orientation="horizontal", + barstyle="success", + ) + ] + display( + widgets.VBox( + add_ins, + layout=_CENTER_LAYOUT, + ) + ) + + def get_patch_image(self, ix: int) -> Image: + """ + Returns the image at the given index. + + Parameters + ---------- + ix : int + The index of the image in the dataframe. + + Returns + ------- + PIL.Image + A PIL.Image object of the image at the given index. + """ + image_path = self.at[ix, self.patch_paths_col] + image = Image.open(image_path) + + if self.resize_to is not None: + image = ImageOps.contain(image, (self.resize_to, self.resize_to)) + # only constrain to max size if not resize_to + elif max(image.size) > self.max_size: + image = ImageOps.contain(image, (self.max_size, self.max_size)) + + return image + + def _add_annotation(self, annotation: str) -> None: + """ + Adds the provided annotation to the current image. + + Parameters + ---------- + annotation : str + The label to add to the current image. + + Returns + ------- + None + """ + # ix = self.iloc[self.current_index].name + ix = self._queue[self.current_index] + self.at[ix, self.label_col] = annotation + self.at[ix, "changed"] = True + if self.auto_save: + self._auto_save() + self._next_example() + + def _auto_save(self): + """ + Automatically saves the annotations made so far. + + Returns + ------- + None + """ + self.get_labelled_data(sort=True).to_csv(self.annotations_file) + + def get_labelled_data( + self, + sort: bool = True, + index_labels: bool = False, + include_paths: bool = True, + ) -> pd.DataFrame: + """ + Returns the annotations made so far. + + Parameters + ---------- + sort : bool, optional + Whether to sort the dataframe by the order of the images in the + input data, by default True + index_labels : bool, optional + Whether to return the label's index number (in the labels list + provided in setting up the instance) or the human-readable label + for each row, by default False + include_paths : bool, optional + Whether to return a column containing the full path to the + annotated image or not, by default True + + Returns + ------- + pandas.DataFrame + A dataframe containing the labelled images and their associated + label index. + """ + if index_labels: + col1 = self.filtered[self.label_col].apply(lambda x: self._labels.index(x)) + else: + col1 = self.filtered[self.label_col] + + if include_paths: + col2 = self.filtered[self.patch_paths_col] + df = pd.DataFrame( + {self.patch_paths_col: col2, self.label_col: col1}, + index=pd.Index(col1.index, name="image_id"), + ) + else: + df = pd.DataFrame(col1, index=pd.Index(col1.index, name="image_id")) + if not sort: + return df + + df["sort_value"] = df.index.to_list() + df["sort_value"] = df["sort_value"].apply( + lambda x: f"{x.split('#')[1]}-{x.split('#')[0]}" + ) + return df.sort_values("sort_value").drop(columns=["sort_value"]) + + @property + def filtered(self) -> pd.DataFrame: + _filter = ~self[self.label_col].isna() + return self[_filter] + + def render_complete(self): + """ + Renders the completion message once all images have been annotated. + + Returns + ------- + None + """ + clear_output() + display( + widgets.HTML("

All annotations done with current settings.

") + ) + if self.auto_save: + self._auto_save() + for button in self._buttons: + button.disabled = True diff --git a/mapreader/annotate/utils.py b/mapreader/annotate/utils.py index 674c78ff..7cda2749 100644 --- a/mapreader/annotate/utils.py +++ b/mapreader/annotate/utils.py @@ -5,21 +5,19 @@ import random import sys +# Ignore warnings +import warnings +from typing import Literal + import matplotlib.pyplot as plt import numpy as np import pandas as pd import requests import yaml from ipyannotate.annotation import Annotation -from ipyannotate.buttons import ( - BackButton as Back, -) -from ipyannotate.buttons import ( - NextButton as Next, -) -from ipyannotate.buttons import ( - ValueButton as Button, -) +from ipyannotate.buttons import BackButton as Back +from ipyannotate.buttons import NextButton as Next +from ipyannotate.buttons import ValueButton as Button from ipyannotate.canvas import OutputCanvas from ipyannotate.tasks import Task, Tasks from ipyannotate.toolbar import Toolbar @@ -27,148 +25,9 @@ from mapreader import load_patches, loader - -def display_record(record: tuple[str, str, str, int, int]) -> None: - """ - Displays an image and optionally, a context image with a patch border. - - Parameters - ---------- - record : tuple - A tuple containing the following elements: - - str : The name of the patch. - - str : The path to the image to be displayed. - - str : The path to the parent image, if any. - - int : The index of the task, if any. - - int : The number of times this patch has been displayed. - - Returns - ------- - None - - Notes - ----- - This function should be called from ``prepare_annotation``, there are - several global variables that are being set in the function. - - This function uses ``matplotlib`` to display images. If the context image - is displayed, the border of the patch is highlighted in red. - - Refer to ``ipyannotate`` and ``matplotlib`` for more info. - """ - - # setup the images - gridsize = (5, 1) - plt.clf() - plt.figure(figsize=(12, 12)) - if treelevel == "patch" and contextimage: - plt.subplot2grid(gridsize, (2, 0)) - else: - plt.subplot2grid(gridsize, (0, 0), rowspan=2) - plt.imshow(Image.open(record[1])) - plt.xticks([]) - plt.yticks([]) - plt.title(f"{record[0]}", size=20) - - if treelevel == "patch" and contextimage: - parent_path = os.path.dirname( - annotation_tasks["paths"][record[3]]["parent_paths"] - ) - # Here, we assume that min_x, min_y, max_x and max_y are in the patch - # name - split_path = record[0].split("-") - min_x, min_y, max_x, max_y = ( - int(split_path[1]), - int(split_path[2]), - int(split_path[3]), - int(split_path[4]), - ) - - # context image - plt.subplot2grid(gridsize, (0, 0), rowspan=2) - - # --- - path = os.path.join(parent_path, record[2]) - par_img = Image.open(path).convert("RGB") - min_y_par = max(0, min_y - y_offset) - min_x_par = max(0, min_x - x_offset) - max_x_par = min(max_x + x_offset, np.shape(par_img)[1]) - max_y_par = min(max_y + y_offset, np.shape(par_img)[0]) - - # par_img = par_img[min_y_par:max_y_par, min_x_par:max_x_par] - par_img = par_img.crop((min_x_par, min_y_par, max_x_par, max_y_par)) - - plt.imshow(par_img, extent=(min_x_par, max_x_par, max_y_par, min_y_par)) - # --- - - plt.xticks([]) - plt.yticks([]) - - # plot the patch border on the context image - plt.plot([min_x, min_x], [min_y, max_y], lw=2, zorder=10, color="r") - plt.plot([min_x, max_x], [min_y, min_y], lw=2, zorder=10, color="r") - plt.plot([max_x, max_x], [max_y, min_y], lw=2, zorder=10, color="r") - plt.plot([max_x, min_x], [max_y, max_y], lw=2, zorder=10, color="r") - - """ - # context image - plt.subplot2grid(gridsize, (3, 0), rowspan=2) - min_y_par = 0 - min_x_par = 0 - max_x_par = par_img.shape[1] - max_y_par = par_img.shape[0] - plt.imshow(par_img[min_y_par:max_y_par, min_x_par:max_x_par], - extent=(min_x_par, max_x_par, max_y_par, min_y_par)) - plt.plot([min_x_par, min_x_par], - [min_y_par, max_y_par], - lw=2, zorder=10, color="k") - plt.plot([min_x_par, max_x_par], - [min_y_par, min_y_par], - lw=2, zorder=10, color="k") - plt.plot([max_x_par, max_x_par], - [max_y_par, min_y_par], - lw=2, zorder=10, color="k") - plt.plot([max_x_par, min_x_par], - [max_y_par, max_y_par], - lw=2, zorder=10, color="k") - - plt.xticks([]) - plt.yticks([]) - - # plot the patch border on the context image - plt.plot([min_x, min_x], - [min_y, max_y], - lw=2, zorder=10, color="r") - plt.plot([min_x, max_x], - [min_y, min_y], - lw=2, zorder=10, color="r") - plt.plot([max_x, max_x], - [max_y, min_y], - lw=2, zorder=10, color="r") - plt.plot([max_x, min_x], - [max_y, max_y], - lw=2, zorder=10, color="r") - """ - - plt.tight_layout() - plt.show() - - print(20 * "-") - print("Additional info:") - print(f"Counter: {record[-1]}") - if url_main: - try: - map_id = record[2].split("_")[-1].split(".")[0] - url = f"{url_main}/{map_id}" - # stream=True so we don't download the whole page, only check if - # the page exists - response = requests.get(url, stream=True) - assert response.status_code < 400 - print() - print(f"URL: {url}") - except: - url = False - pass +warnings.filterwarnings("ignore") +# warnings.filterwarnings( +# "ignore", message="Pandas doesn't allow columns to be created via a new attribute name") def prepare_data( @@ -227,7 +86,7 @@ def prepare_data( # annotate all patches in the pandas dataframe pass - tar_param = "mean_pixel_R" + tar_param = "mean_pixel_RGB" if tar_param in df.columns: try: pd.options.mode.chained_assignment = None @@ -267,7 +126,7 @@ def annotation_interface( list_labels: list, list_colors: list[str] | None = None, annotation_set: str | None = "001", - method: str | None = "ipyannotate", + method: Literal["ipyannotate", "pigeonxt"] | None = "ipyannotate", list_shortcuts: list[str] | None = None, ) -> Annotation: """ @@ -286,7 +145,7 @@ def annotation_interface( annotation_set : str, optional String representing the annotation set, specified in the yaml file or via function argument, by default ``"001"``. - method : str, optional + method : Literal["ipyannotate", "pigeonxt"], optional String representing the method for annotation, by default ``"ipyannotate"``. list_shortcuts : list, optional @@ -302,7 +161,7 @@ def annotation_interface( Raises ------ SystemExit - If ``method`` parameter is not ``"ipyannotate"``. + If ``method`` parameter is not ``"ipyannotate"`` or ``pigeonxt``. Notes ----- @@ -312,7 +171,152 @@ def annotation_interface( if list_colors is None: list_colors = ["red", "green", "blue", "green"] - if method == "ipyannotate": + if method.lower() == "ipyannotate": + + def display_record(record: tuple[str, str, str, int, int]) -> None: + """ + Displays an image and optionally, a context image with a patch + border. + + Parameters + ---------- + record : tuple + A tuple containing the following elements: + - str : The name of the patch. + - str : The path to the image to be displayed. + - str : The path to the parent image, if any. + - int : The index of the task, if any. + - int : The number of times this patch has been displayed. + + Returns + ------- + None + + Notes + ----- + This function should be called from ``prepare_annotation``, there + are several global variables that are being set in the function. + + This function uses ``matplotlib`` to display images. If the + context image is displayed, the border of the patch is highlighted + in red. + + Refer to ``ipyannotate`` and ``matplotlib`` for more info. + """ + + # setup the images + gridsize = (5, 1) + plt.clf() + plt.figure(figsize=(12, 12)) + if treelevel == "patch" and contextimage: + plt.subplot2grid(gridsize, (2, 0)) + else: + plt.subplot2grid(gridsize, (0, 0), rowspan=2) + plt.imshow(Image.open(record[1])) + plt.xticks([]) + plt.yticks([]) + plt.title(f"{record[0]}", size=20) + + if treelevel == "patch" and contextimage: + parent_path = os.path.dirname( + annotation_tasks["paths"][record[3]]["parent_paths"] + ) + # Here, we assume that min_x, min_y, max_x and max_y are in the patch + # name + split_path = record[0].split("-") + min_x, min_y, max_x, max_y = ( + int(split_path[1]), + int(split_path[2]), + int(split_path[3]), + int(split_path[4]), + ) + + # context image + plt.subplot2grid(gridsize, (0, 0), rowspan=2) + + # --- + path = os.path.join(parent_path, record[2]) + par_img = Image.open(path).convert("RGB") + min_y_par = max(0, min_y - y_offset) + min_x_par = max(0, min_x - x_offset) + max_x_par = min(max_x + x_offset, np.shape(par_img)[1]) + max_y_par = min(max_y + y_offset, np.shape(par_img)[0]) + + # par_img = par_img[min_y_par:max_y_par, min_x_par:max_x_par] + par_img = par_img.crop((min_x_par, min_y_par, max_x_par, max_y_par)) + + plt.imshow(par_img, extent=(min_x_par, max_x_par, max_y_par, min_y_par)) + # --- + + plt.xticks([]) + plt.yticks([]) + + # plot the patch border on the context image + plt.plot([min_x, min_x], [min_y, max_y], lw=2, zorder=10, color="r") + plt.plot([min_x, max_x], [min_y, min_y], lw=2, zorder=10, color="r") + plt.plot([max_x, max_x], [max_y, min_y], lw=2, zorder=10, color="r") + plt.plot([max_x, min_x], [max_y, max_y], lw=2, zorder=10, color="r") + + """ + # context image + plt.subplot2grid(gridsize, (3, 0), rowspan=2) + min_y_par = 0 + min_x_par = 0 + max_x_par = par_img.shape[1] + max_y_par = par_img.shape[0] + plt.imshow(par_img[min_y_par:max_y_par, min_x_par:max_x_par], + extent=(min_x_par, max_x_par, max_y_par, min_y_par)) + plt.plot([min_x_par, min_x_par], + [min_y_par, max_y_par], + lw=2, zorder=10, color="k") + plt.plot([min_x_par, max_x_par], + [min_y_par, min_y_par], + lw=2, zorder=10, color="k") + plt.plot([max_x_par, max_x_par], + [max_y_par, min_y_par], + lw=2, zorder=10, color="k") + plt.plot([max_x_par, min_x_par], + [max_y_par, max_y_par], + lw=2, zorder=10, color="k") + + plt.xticks([]) + plt.yticks([]) + + # plot the patch border on the context image + plt.plot([min_x, min_x], + [min_y, max_y], + lw=2, zorder=10, color="r") + plt.plot([min_x, max_x], + [min_y, min_y], + lw=2, zorder=10, color="r") + plt.plot([max_x, max_x], + [max_y, min_y], + lw=2, zorder=10, color="r") + plt.plot([max_x, min_x], + [max_y, max_y], + lw=2, zorder=10, color="r") + """ + + plt.tight_layout() + plt.show() + + print(20 * "-") + print("Additional info:") + print(f"Counter: {record[-1]}") + if url_main: + try: + map_id = record[2].split("_")[-1].split(".")[0] + url = f"{url_main}/{map_id}" + # stream=True so we don't download the whole page, only check if + # the page exists + response = requests.get(url, stream=True) + assert response.status_code < 400 + print() + print(f"URL: {url}") + except: + url = False + pass + if not list_shortcuts: list_shortcuts = [ "1", @@ -369,7 +373,7 @@ def annotation_interface( return annotation sys.exit( - f"method: {method} is not implemented. Currently, we support: ipyannotate" # noqa + f"method: {method} is not implemented. Currently, we support: ipyannotate and pigeonxt" # noqa ) @@ -395,6 +399,7 @@ def prepare_annotation( urlmain: str | None = "https://maps.nls.uk/view/", random_state: str | int | None = "random", list_shortcuts: list[tuple] | None = None, + method: Literal["ipyannotate", "pigeonxt"] | None = "ipyannotate", ) -> dict: """Prepare image data for annotation and launch the annotation interface. @@ -469,6 +474,9 @@ def prepare_annotation( list_shortcuts : list of tuples, optional A list of tuples containing shortcut key assignments for label names. Default is ``None``. + method : Literal["ipyannotate", "pigeonxt"], optional + String representing the method for annotation, by default + ``"ipyannotate"``. Returns ------- @@ -484,6 +492,8 @@ def prepare_annotation( """ # Specify global variables so they can be used in display_record function + if custom_labels is None: + custom_labels = [] if custom_labels is None: custom_labels = [] global annotation_tasks @@ -616,6 +626,7 @@ def prepare_annotation( list_labels=list_labels, annotation_set=annotation_set, list_shortcuts=list_shortcuts, + method=method, ) return annotation diff --git a/setup.py b/setup.py index b4fc4481..20a51ffc 100644 --- a/setup.py +++ b/setup.py @@ -42,7 +42,7 @@ "torchvision>=0.11.1,<0.12.1", "jupyter>=1.0.0,<2.0.0", "ipykernel>=6.5.1,<7.0.0", - "ipywidgets>=7.7.3,<8.0.0", + "ipywidgets>=8.0.0,<9.0.0", "ipyannotate==0.1.0-beta.0", "Cython>=0.29.24,<0.30.0", # "proj>=0.2.0,<0.3.0", @@ -51,12 +51,12 @@ "parhugin>=0.0.3,<0.0.4", "geopy==2.1.0", "rasterio>=1.2.10,<2.0.0", - "keplergl>=0.3.2,<0.4.0", "simplekml>=1.3.6,<2.0.0", "versioneer>=0.28", "tqdm<5.0.0", "torchinfo<2.0.0", "openpyxl<4.0.0", + "geopandas<1.0.0", ], extras_require={ "dev": [ diff --git a/tests/test_annotator.py b/tests/test_annotator.py new file mode 100644 index 00000000..77feb5a9 --- /dev/null +++ b/tests/test_annotator.py @@ -0,0 +1,257 @@ +from __future__ import annotations + +from pathlib import Path + +import pytest + +from mapreader import Annotator, loader + + +@pytest.fixture +def sample_dir(): + return Path(__file__).resolve().parent / "sample_files" + + +@pytest.fixture +def load_dfs(sample_dir, tmp_path): + my_maps = loader(f"{sample_dir}/cropped_74488689.png") + my_maps.add_metadata(f"{sample_dir}/ts_downloaded_maps.csv") + my_maps.patchify_all( + patch_size=3, path_save=f"{tmp_path}/patches/" + ) # creates 9 patches + parent_df, patch_df = my_maps.convert_images() + parent_df.to_csv(f"{tmp_path}/parent_df.csv") + patch_df.to_csv(f"{tmp_path}/patch_df.csv") + return parent_df, patch_df, tmp_path + + +def test_init_with_dfs(load_dfs): + parent_df, patch_df, tmp_path = load_dfs + annotator = Annotator( + patch_df=patch_df, + parent_df=parent_df, + labels=["a", "b"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + ) + assert len(annotator) == 9 + assert isinstance(annotator.iloc[0]["coordinates"], tuple) + + +def test_init_with_csvs(load_dfs): + _, _, tmp_path = load_dfs + annotator = Annotator( + patch_df=f"{tmp_path}/patch_df.csv", + parent_df=f"{tmp_path}/parent_df.csv", + labels=["a", "b"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + ) + assert len(annotator) == 9 + assert isinstance(annotator.iloc[0]["coordinates"], tuple) + + +def test_init_with_fpaths(load_dfs, sample_dir): + _, _, tmp_path = load_dfs + annotator = Annotator( + patch_paths=f"{tmp_path}/patches/*png", + parent_paths=f"{sample_dir}/cropped_74488689.png", + metadata_path=f"{sample_dir}/ts_downloaded_maps.csv", + labels=["a", "b"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + ) + assert len(annotator) == 9 + assert "mean_pixel_R" in annotator.columns + + +def test_init_with_fpaths_tsv(load_dfs, sample_dir): + _, _, tmp_path = load_dfs + annotator = Annotator( + patch_paths=f"{tmp_path}/patches/*png", + parent_paths=f"{sample_dir}/cropped_74488689.png", + metadata_path=f"{sample_dir}/ts_downloaded_maps.tsv", + labels=["a", "b"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + delimiter="\t", + ) + assert len(annotator) == 9 + assert "mean_pixel_R" in annotator.columns + + +def test_no_labels(load_dfs): + parent_df, patch_df, tmp_path = load_dfs + annotator = Annotator( + patch_df=patch_df, + parent_df=parent_df, + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + ) + assert len(annotator) == 9 + assert annotator._labels == [] + + annotator._labels = ["a", "b"] + assert annotator._labels == ["a", "b"] + + +def test_duplicate_labels(load_dfs): + parent_df, patch_df, tmp_path = load_dfs + annotator = Annotator( + patch_df=patch_df, + parent_df=parent_df, + labels=["a", "b", "a"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + ) + assert len(annotator) == 9 + assert annotator._labels == ["a", "b"] + + +def test_labels_sorting(load_dfs): + parent_df, patch_df, tmp_path = load_dfs + annotator = Annotator( + patch_df=patch_df, + parent_df=parent_df, + labels=["b", "a"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + ) + assert len(annotator) == 9 + assert annotator._labels == ["b", "a"] + + +def test_sortby(load_dfs): + parent_df, patch_df, tmp_path = load_dfs + annotator = Annotator( + patch_df=patch_df, + parent_df=parent_df, + labels=["a", "b"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + sortby="min_x", + ascending=False, + ) + queue = annotator.get_queue() + assert len(queue) == 9 + assert queue[0] == "patch-6-0-9-3-#cropped_74488689.png#.png" + assert queue[-1] == "patch-0-6-3-9-#cropped_74488689.png#.png" + + +def test_min_values(load_dfs): + parent_df, patch_df, tmp_path = load_dfs + annotator = Annotator( + patch_df=patch_df, + parent_df=parent_df, + labels=["a", "b"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + min_values={"min_x": 3}, + ) + queue = annotator.get_queue() + assert len(queue) == 6 + assert queue[0] == "patch-3-0-6-3-#cropped_74488689.png#.png" + assert queue[-1] == "patch-6-6-9-9-#cropped_74488689.png#.png" + + +def test_max_values(load_dfs): + parent_df, patch_df, tmp_path = load_dfs + annotator = Annotator( + patch_df=patch_df, + parent_df=parent_df, + labels=["a", "b"], + annotations_dir=f"{tmp_path}/annotations/", + auto_save=False, + max_values={"min_x": 0}, + ) + queue = annotator.get_queue() + assert len(queue) == 3 + assert queue[0] == "patch-0-0-3-3-#cropped_74488689.png#.png" + assert queue[-1] == "patch-0-6-3-9-#cropped_74488689.png#.png" + + +# errors + + +def test_incorrect_csv_paths(load_dfs): + with pytest.raises(FileNotFoundError): + Annotator( + patch_df="fake_df.csv", + parent_df="fake_df.csv", + ) + _, _, tmp_path = load_dfs + with pytest.raises(FileNotFoundError): + Annotator( + patch_df=f"{tmp_path}/patch_df.csv", + parent_df="fake_df.csv", + ) + + +def test_incorrect_delimiter(load_dfs): + _, _, tmp_path = load_dfs + with pytest.raises(ValueError): + Annotator( + patch_df=f"{tmp_path}/patch_df.csv", + parent_df=f"{tmp_path}/parent_df.csv", + delimiter="|", + ) + + +def test_init_dfs_value_error(load_dfs): + with pytest.raises(ValueError, match="path to a csv or a pandas DataFrame"): + Annotator( + patch_df=1, + parent_df=1, + ) + _, _, tmp_path = load_dfs + with pytest.raises(ValueError, match="path to a csv or a pandas DataFrame"): + Annotator( + patch_df=f"{tmp_path}/patch_df.csv", + parent_df=1, + ) + + +def test_no_url_col(load_dfs): + parent_df, patch_df, _ = load_dfs + parent_df = parent_df.drop(columns=["url"]) + with pytest.raises(ValueError, match="should contain a 'url' column"): + Annotator( + patch_df=patch_df, + parent_df=parent_df, + ) + + +def test_no_image_path_col(load_dfs): + parent_df, patch_df, _ = load_dfs + patch_df = patch_df.drop(columns=["image_path"]) + with pytest.raises(ValueError, match="does not have the image paths column"): + Annotator( + patch_df=patch_df, + parent_df=parent_df, + ) + + +def test_sortby_value_errors(load_dfs): + parent_df, patch_df, _ = load_dfs + with pytest.raises(ValueError, match="not a column"): + Annotator( + patch_df=patch_df, + parent_df=parent_df, + sortby="fake_col", + ) + with pytest.raises(ValueError, match="must be a string or None"): + Annotator( + patch_df=patch_df, + parent_df=parent_df, + sortby=1, + ) + + +def test_fpaths_metadata_filenotfound_error(load_dfs, sample_dir): + _, _, tmp_path = load_dfs + with pytest.raises(FileNotFoundError): + Annotator( + patch_paths=f"{tmp_path}/patches/*png", + parent_paths=f"{sample_dir}/cropped_74488689.png", + metadata_path="fake_df.csv", + ) diff --git a/tests/test_import.py b/tests/test_import.py index ef69c23f..ab2a67c2 100644 --- a/tests/test_import.py +++ b/tests/test_import.py @@ -1,8 +1,8 @@ - def test_import(): # This is based on all the imports found in the various tutorial notebooks from mapreader import ( + Annotator, ClassifierContainer, load_patches, AnnotationsLoader, @@ -11,11 +11,4 @@ def test_import(): PatchDataset, Downloader, SheetDownloader, - ) - from mapreader.annotate.utils import prepare_annotation, save_annotation - - # These imports are the various geo packages that previously where a separate subpackage - import geopy - import rasterio - import keplergl - import simplekml + ) \ No newline at end of file diff --git a/worked_examples/annotation/how-to-annotate-patches.ipynb b/worked_examples/annotation/how-to-annotate-patches.ipynb new file mode 100644 index 00000000..ed0b51a1 --- /dev/null +++ b/worked_examples/annotation/how-to-annotate-patches.ipynb @@ -0,0 +1,355 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to annotate patches on a map sheet.\n", + "\n", + "In this example, we will download one sheet — WFS ID 439, and start annotating it." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Import all necessary components of MapReader" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from mapreader import SheetDownloader, loader" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Define parameters and query for SheetDownloader" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_ts = SheetDownloader(\n", + " metadata_path=\"../geospatial/NLS_metadata/metadata_OS_One_Inch_GB_WFS_light.json\",\n", + " download_url=\"https://mapseries-tilesets.s3.amazonaws.com/1inch_2nd_ed/{z}/{x}/{y}.png\",\n", + ")\n", + "\n", + "my_ts.query_map_sheets_by_wfs_ids(439)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set zoom level\n", + "my_ts.get_grid_bb()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Download\n", + "my_ts.download_map_sheets_by_queries(path_save=\"./download/maps\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Patchify!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# First load all the files\n", + "my_files = loader(\"./download/maps/*.png\")\n", + "my_files.add_metadata(metadata=\"./download/maps/metadata.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Then patchify!\n", + "my_files.patchify_all(path_save=\"./download/patches\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "parent_df, patch_df = my_files.convert_images(save=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "patch_df.head()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preview the map\n", + "\n", + "Let's have a quick look at our map." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_files.show_sample(num_samples=1)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Annotate patches\n", + "\n", + "Now, we're ready to annotate patches. Let's set up an `Annotator`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from mapreader import Annotator" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator = Annotator(\n", + " patch_paths=\"./download/patches/*.png\",\n", + " parent_paths=\"./download/maps/*.png\",\n", + " metadata_path=\"./download/maps/metadata.csv\",\n", + " delimiter=\",\",\n", + " labels=[\"label 1\", \"label 2\"],\n", + " username=\"rosie\",\n", + " sortby=\"mean_pixel_R\",\n", + " ascending=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator.annotate(resize_to=300)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you're progressing through the patches to annotate them, you'll see a file in the `annotations` directory here:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "[str(x) for x in Path(\"./annotations/\").glob(\"*.csv\")]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also access the annotations as a DataFrame here in Jupyter:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator.filtered" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you'd like to see the context image, set ``show_context`` to ``True``." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator.annotate(show_context=True, resize_to=None)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, we can use our patch and parent dataframes directly (i.e. instead of loading parents/patches from file paths)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator = Annotator(\n", + " patch_df = patch_df,\n", + " parent_df = parent_df,\n", + " delimiter=\",\",\n", + " labels=[\"label 1\", \"label 2\"],\n", + " username=\"rosie\",\n", + " context=False,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator.annotate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or, since we saved these as CSV files, we can pass the path to the parent/patch CSVs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator = Annotator(\n", + " patch_df = \"./patch_df.csv\",\n", + " parent_df = \"./parent_df.csv\",\n", + " delimiter=\",\",\n", + " labels=[\"label 1\", \"label 2\"],\n", + " username=\"new\",\n", + " context=False,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator.annotate()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "annotator.filtered" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "mr_dev", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}