Skip to content

CoralNet documentation

StephenChan edited this page Apr 7, 2013 · 14 revisions

This page is for information on the CoralNet codebase that developers need to know. It only contains information specific to CoralNet; information general to Django can be found in the Django notes page.

Contents

Machine annotator

Details about the system

The automated system consist of several modules

  • Preprocess image
  • Extract image features
  • Classify images
  • Train classifier

For the discussion below, lets define, TI = 50, to be a source threshold on the number of annotated images needed before training begins. I also assume the following fields exist in the Image and Source objects.

  • Image.status.preprocessed
  • Image.status.featuresextracted
  • Image.status.hasRandomPoints
  • Image.status.annotatedByRobot
  • Image.status.annotatedByHuman
  • Image.status.featureFileHasHumanLabels
  • Image.status.usedForTraining
  • Image.processDate - this is set by tasks.preProcess(Image) and is appended to all derived files used in the feature extraction process
  • Image.latestRobotAnnotationVersion
  • Source.TI = 50 (by default)

Also, a robot table is created with fields. [For now, each robot will be tied to a Source, ''not'' a Label Set. I need to do some experiments first before we decide about this.

  • [long int] id
  • [long int] Source (there will be one robot per source)
  • [long int] Version
  • [string] pathToModel
  • [datetime object] datetime
  • [Long int] time_to_train

[NOTE: we use a single, global dictionary for all feature mapping. If this is updated, all feature extraction needs to be rerun manually. This is stored in preprocessparameters.mat struct].

Generate Random points

  • tasks.GenerateRandomPoints(image)
  • PREREQ: (annotation area)

Preprocess image

  • tasks.PreprocessImage(image)
  • PREREQ: (the imageHeight field of the image object) OR (the imageHeight field of the source is set)
  • SYNTAX: coralnet_preprocessImage(imageFile, preProcessedImageFile, preProcessParameterFile, imageHeightFile, errorLogfile);
  • DATABASE MOD: Image.processDate is set to the current date.
  • DATEBASE MOD: Image.status.preprocessed := 'true'.
  • INPUT preProcessedImageFile should be <processing root>/preprocess/<imageId>_<Image.processDate>.mat
  • INPUT imageFile is the original input image file.
  • INPUT preProcessParameterFile is stored in <processing root>/preprocess/preProcessParameters.mat
  • INPUT pixel-cm is a file stored in <processing root>/preprocess/<imageId>_imageHeight.txt (this file only contains a single integer indicating the image height)
  • INPUT errorLogfile is <processing root>/errorlogs/preprocess_error.txt

Feature extraction

  • tasks.MakeFeatures(image)
  • PREREQ: Image.status.preprocess = true
  • NOTE: we assume there is ALWAYS random point availible for an image.
  • SYNTAX: coralnet_makeFeatures(preProcessedImageFile, featureFile, rowColFile, featureExtractionParameterFile, errorLogfile);
  • DATABASE MOD: Image.status.featuresextracted := 'true'
  • INPUT preProcessedImageFile should be <processing root>/preprocess/<imageId>_<Image.processDate>.mat
  • INPUT featureFile is <processing root>/preprocess/<imageId>_<Image.processDate>.dat
  • INPUT rowColFile is a N by 2 text file with entries [row col] on each row. (.txt)
  • INPUT featureExtractionParameterFile is stored in <processing root>/features/featureExtractionParameters.mat
  • INPUT errorLogfile is <processing root>/errorlogs/features_error.txt

Classify Images

  • tasks.Classify(image)
  • PREREQ: image.status.featuresextracted AND !Image.isAnnotatedByHuman AND Exist robot for that source AND Image.latestRobotAnnotation < RV
  • NOTE: RV is the latest robot version of this source, found by looking in the robot-table.
  • SYNTAX: coralnet_classify(featureFile, modelFile, labelFile, classifyErrorLog);
  • INPUT featureFile is <processing root>/preprocess/<imageId>_<Image/processDate>.dat
  • INPUT modelFile is <processing root>/models/<image.source.id>_<RV>_<datestr>.txt
  • INPUT labelFile is <processing root>/classify/<imageId>_<Image.processDate>.txt
  • DATABASE MOD: Image.status := '3'.
  • DATABASE MOD: Image.latestRobotAnnotation = RV.
  • DATABASE MOD: Reads the labelfile and creates new annotations using the robot user together with RV.

Add Labels To Feature Files

  • tasks.addLabelsToFeatures(image)
  • PREREQ: image.status.annotatedByHuman AND image.status.featuresExtracted
  • INPUT featureFile is <processing root>/preprocess/<imageId>_<Image.processDate>.dat
  • DATABASE MOD: images.status.featureFileHasHumanLabels := true

Train Classifier

  • tasks.Train(Source)
  • PREREQ: (Exist image where images.featureFileHasHumanLabels AND not(image.usedForTraining )
  • NOTE: we don not require ( #images with image.featureFileHasHumanLabels > source.TI )
  • SYNTAX:
  • DATABASE MOD: Will add an entry to the robot-table specifying the robot version, the date, path to the new model, and other meta data fields.

Maintenance notice

When a developer is updating the server code, manually changing something in the database, or simply restarting the server, it's highly recommended to put a notice on the site saying that the site is under maintenance. That way, users get the message that they should pause their work (or continue at their own risk) until the site is no longer under maintenance.

To put up the maintenance notice:

  1. Locate templates/maintenance_notice.html on the server and open it with an editor.
  2. If you're not sure what to do, read the comments in maintenance_notice.html (they are instructions).
  3. Locate the line AFTER {% endcomment %} which reads something like {# {% set_maintenance_time "10:30 PM" as maintenance_time %} #}. Uncomment this line (remove the {# and #}) and set the time according to the above instructions. It's recommended to set the maintenance time at least several minutes later than the current time, to give users a bit of advance warning.
  4. Save your changes.

Then, once maintenance time has begun, start your maintenance work.

Once you've finished your maintenance work, take down the maintenance notice by commenting out the set_maintenance_time line.

Settings files

A Django project gets its settings from settings.py. However, we've split up our settings into two files: settings.py, and settings_2.py (which settings.py imports from).

  • settings.py is for settings that should be the same for our production server and developers' copies (such as installed apps). This file is under Git control.
  • settings_2.py is for settings that should be different between the production server and developers' copies (such as passwords). This file is ignored by Git, and should stay that way.

Each developer needs to make sure that their own settings_2.py has all the necessary settings in it. The settings that settings_2.py should have are listed in settings_2_example.py. settings_2_example.py isn't used by any Django module; it's there for developers' reference. If you add a setting that needs to go in settings_2.py, please update settings_2_example.py accordingly for others' reference.

Unit tests

See Django notes - Unit tests for some basic info.

Test locations

As of 2012 October 8: All CoralNet unit tests are in tests.py files under each app. lib/tests.py is for tests that don't pertain to a particular app.

Requirements

  • As noted in Setup - Python utilities and packages, you need to install PyYAML. The fixtures we use in tests are written in YAML; compared to JSON or XML, YAML eases the process of writing fixtures by hand. However, Python itself doesn't have a built-in YAML parser, so PyYAML is needed to be able to load these YAML fixtures.
  • settings.TEST_MEDIA_ROOT and TEST_PROCESSING_ROOT:
    • TEST_MEDIA_ROOT is the directory that will act as MEDIA_ROOT (the location where uploaded files are placed) during media-related unit tests. TEST_MEDIA_ROOT must be different from the normal MEDIA_ROOT. It is recommended to have TEST_MEDIA_ROOT located outside of your Git repository directory, because it will never be committed to Git.
      • The same goes for TEST_PROCESSING_ROOT; it's the PROCESSING_ROOT that's used during unit tests.
    • As noted in Setup - Final setup steps for Django, you may need to create these test directories and their relevant sub-directories.
    • These directories and sub-directories must not contain any files at the start of the test, or an exception will be raised. See "Safety note" below for more info.
  • Set settings.SOUTH_TESTS_MIGRATE to an appropriate value. If True, then all South migrations are run when preparing the test database (which is built before running any unit tests). If False, then the test database is built with the default syncdb behavior instead of with South migrations. False is recommended because it's much faster to not run the migrations. The only drawback of using False is that it may miss out on important data that certain migrations add (such as the Imported and Robot users), but our initial_data fixture (lib/fixtures/initial_data.yaml) should eliminate this potential problem.

Safety note

To ensure that accidental file loss cannot occur from setting TEST_MEDIA_ROOT to the wrong directory, the following sanity checks are automatically made for each test: (1) before the test starts, the test aborts if TEST_MEDIA_ROOT has files in it; (2) during post-testing file cleanup, if TEST_MEDIA_ROOT has any files that were apparently created before the test started, those files are not deleted and an error is raised.

The same goes for TEST_PROCESSING_ROOT.

Running unit tests

Django's test management command, when run with no arguments, runs all unit tests from all installed apps, including third-party ones. However, running third-party apps' tests is usually not necessary; it is mainly only done to confirm compatibility and correct installation of these apps.

Thus, CoralNet has a custom management command, mytest, which runs only CoralNet's tests, not third-party apps' tests. Run this management command with python manage.py mytest. This is the recommended way to run regression tests for CoralNet.

Our apps are defined in settings.MY_INSTALLED_APPS. The mytest command is a custom management command defined in lib/management/commands/mytest.py.

Writing unit tests

Use the settings.SAMPLE_UPLOADABLES directory to store files that will be uploaded during unit tests.

When writing a new test class:

  • Inherit from lib.test_utils.ClientTest if the test will be using the test client. This allows testing somewhat high-level web browser operations like requesting URLs and submitting forms.
  • Inherit from lib.test_utils.BaseTest if the test won't be using the test client.
  • If other functionality is needed:
    • Write your own subclass of BaseTest or ClientTest. If you write your own setUp() or tearDown() methods, don't forget to call the parent class's setUp() or tearDown() as well.
    • If you want to write pieces of functionality that can be combined into an arbitrary set of components, write test component classes. These test components can be included in any test class by overriding the BaseTest class variable extra_components. For an example of a test component class, see lib.test_utils.MediaTestComponent; for an example of a test class that uses test components, see images.tests.ImageProcessingTaskTest.