From 5e86f6bcc1243619cbfd39dea916f92bf7c33209 Mon Sep 17 00:00:00 2001 From: Dan Birman Date: Sat, 23 Nov 2024 18:52:47 -0800 Subject: [PATCH] docs: updating readme with info copied from main docs --- README.md | 144 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 104 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index 388e6f7..a1d2075 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,78 @@ # QC Portal -The QC Portal app makes the `quality_control` metadata (see [aind-data-schema](https://github.com/allenNeuralDynamics/aind-data-schema)) explorable and provides tools for manual annotation of metrics. +The [QC Portal](https://qc.allenneuraldynamics.org/qc_portal_app) is a browser application that allows users to view and interact with the AIND QC metadata and to annotate ``PENDING`` metrics with qualitative evaluations. The portal is currently maintained by Dan Birman in scientific computing, reach out with any questions or concerns. -For general documentation about the QC metadata, go [here](https://aind-data-schema.readthedocs.io/en/latest/quality_control.html) +The portal works by pulling the metadata from the Document Database (DocDB) and pulling reference figures from Code Ocean (CO) data assets, or from storage in Kachery Cloud. -## Uploading data from CO Capsules +The portal allows users to annotate `PENDING` metrics. Logged in users can modify the value, state, and notes on metrics. When you make changes the **submit** button will be enabled. Submitting pushes your updates to DocDB along with a timestamp and your name. + +For general documentation about the QC metadata, go [here](https://aind-data-schema.readthedocs.io/en/latest/quality_control.html). + +## Defining metrics for the QC portal + +For AIND users, we expect your metrics to have actionable `value` fields. Either the value should be a number that a rule can be applied to (e.g. a threshold) or it should refer to the state of the reference (e.g. "high drift" when linked to a drift map, or "acceptable contrast" when linked to a video). + +All metrics should have a `reference` figure attached. Even if you are just calculating numbers, your reference figures can put those numbers in context for viewers. + +**Q: How do reference URLs get pulled into the QC Portal?** + +Each metric is associated with a reference figure. We support: + +- Vector files (svg, pdf) +- Images (png, jpg, etc) +- Videos (mp4) +- Neuroglancer links (url) +- Rerun files (rrd) + +Figures, images, and videos can be any size, but they will fit best on the screen if they are landscape and shaped roughly like a computer screen (for example, 1280×800 or 1900×1200 px). + +You can link to your references in one of four ways: + +- Provide a relative path to a file in the data asset's S3 bucket, i.e. "figures/my_figure.png". The mount/asset name should not be included. +- Provide a url to a publicly accessible file, i.e. "https://mywebsite.com/myfile.png" +- Provide a path to any public S3 bucket, i.e. "s3://bucket/myfile.png" +- Provide a kachery-cloud hash, i.e. "sha1://uuid.ext", note that only for FigURL hashes you **must append the filetype**. The easiest way to do this is to set the `label` field to the filename, see below. + +**Q: I saw fancy things like dropdowns in the QC Portal, how do I do that?** + +The portal supports a few special cases to allow a bit more flexibility or to constrain the actions that manual annotators can take. Install the [`aind-qcportal-schema`](https://github.com/AllenNeuralDynamics/aind-qcportal-schema/blob/dev/src/aind_qcportal_schema/metric_value.py) package and set the `value` field to the corresponding pydantic object to use these. Current options include: + +- Dropdowns (optionally the options can auto-set the value) +- Checkboxes (again options can auto-set the value) +- Rule-based metrics (the rule is automatically run to set the value) +- Multi-asset metrics where each asset is assigned it's own value + +There are also some custom rules for the value field. If you provide: + +- Two strings separated by a semicolon `;` they will be displayed in a "Swipe" pane that lets you swipe back and forth between the two things. Mostly useful for overlay images. +- A dictionary where every value is a list of equal length, it will be displayed as a table where the keys are column headers and the values are rows. If a key "index" is included the values will be used to name the rows. + +## How to upload data from CO Capsules ### Preferred workflow -Use the preferred workflow if you are generating a data asset. Your `quality_control.json` will go in the top level and your figures will go in a folder. Follow the steps below: +Use the preferred workflow if you are **generating a data asset**, e.g. when uploading raw data or generating a new derived data asset. Your `quality_control.json` will go in the top level and your figures will go in a folder. Follow the steps below: 1. Develop your QC pipeline, generating metrics and reference figures as needed. Place references in the `results/` folder. 2. Populate your `QCEvaluation` objects with metrics. The `reference` field should contain the path *relative to the results* folder. I.e. the file `results/figures/my_figure.png` should be included as `QCMetric.reference = "figures/my_figure.png"`. 3. Write the standard QC file: `QualityControl.write_standard_file()` -Done! +Make sure to follow the standard instructions for building derived assets: copy all metadata files, upgrade the data_description to derived, and name your asset according to the expected conventions. Make sure to tag your data asset as `derived` so that it will be picked up by the indexer. + +Done! In the preferred workflow no additional permissions are required. Your QC data will appear in the portal within four hours of creation. ### Alternate workflow -Use the alternate workflow if you are **not** generating a data asset and therefore need to push your QC data back to an existing data asset. You need to push your `QCEvaluation` objects to DocDB and you need to push your figures to `kachery-cloud`. +Use the alternate workflow if you are **not generating a data asset** and therefore need to push your QC data back to an already existing data asset. You will push your `QCEvaluation` objects directly to DocDB and you will need to push your figures to `kachery-cloud`, an external repository that generates permanent links to uploaded files. -You'll need to run `pip install kachery-cloud aind-data-access-api[docdb]` as part of your environment setup. +Two things need to be setup in your capsule: -Then, in your capsule settings attach the `aind-codeocean-power-user` role. If you don't have access to this role, ask someone in Scientific Computing to attach it for you. +1. You'll need to run `pip install kachery-cloud` and `pip install aind-data-access-api[docdb]` as part of your environment setup. +2. In your capsule settings attach the `aind-codeocean-power-user` role. If you don't have access to this role, ask someone in Scientific Computing to attach it for you. #### (1) Acquire your DocDB _id using your data asset's name -To upload directly to DocDB you'll need to know your DocDB `_id`. You can get it by adding this code to your capsule and calling `query_docdb_id(asset_name)`. +To upload directly to DocDB you'll need to know your asset's `_id`. You can get it by adding this code to your capsule and calling `query_docdb_id(asset_name)`. Note that this *is not the data asset id in Code Ocean*! ```{python} def query_docdb_id(asset_name: str): @@ -57,35 +103,11 @@ def query_docdb_id(asset_name: str): return docdb_id ``` -#### (2) Metadata - -Generate your `QCEvaluation` objects. Then run the following code snippet. You can pass all your evaluations as a list or pass them one at a time: - -```{python} -session = boto3.Session() -credentials = session.get_credentials() -host = "api.allenneuraldynamics.org" - -auth = AWSRequestsAuth( -aws_access_key=credentials.access_key, -aws_secret_access_key=credentials.secret_key, -aws_token=credentials.token, -aws_host="api.allenneuraldynamics.org", -aws_region='us-west-2', -aws_service='execute-api' -) -url = f"https://{host}/v1/add_qc_evaluation" -post_request_content = {"data_asset_id": docdb_id, - "qc_evaluation": qc_eval.model_dump(mode='json')} -response = requests.post(url=url, auth=auth, - json=post_request_content) +#### (2) Generate your QC data -if response.status_code != 200: - print(response.status_code) - print(response.text) -``` +Generate your metrics and reference figures. Put your figures in folders in the results, e.g. `results/figures/` and store the filepaths. -#### (3) Figures +#### (3) Push figures to `kachery-cloud` Your figures should already exist in folders in your `results/`. Then, in your capsule code, pull the Kachery Cloud credentials using this function: @@ -118,9 +140,13 @@ def get_kachery_secrets(): os.environ['KACHERY_ZONE'] = kachery_secrets['KACHERY_ZONE'] os.environ['KACHERY_CLOUD_CLIENT_ID'] = kachery_secrets['KACHERY_CLOUD_CLIENT_ID'] os.environ['KACHERY_CLOUD_PRIVATE_KEY'] = kachery_secrets['KACHERY_CLOUD_PRIVATE_KEY'] + +get_kachery_secrets() ``` -Each of your figures should then be uploaded as a stored file: +The credentials are now stored as enviroment keys. + +Each of your figures should then be uploaded using the `store_file` function: ``` import kachery_cloud as kcl @@ -129,7 +155,41 @@ file_path = "your_file_path.ext" uri = kcl.store_file(file_path, label=file_path) ``` -Finally, set the reference field of each metric to the returned uri `QCMetric.reference = uri`. Each URI is a unique hashed string that will allow the portal to recover your file. Make sure to include the `label` parameter or we won't be able to identify your filetype in the portal. +#### (4) Generate your QCEvaluation objects + +Generate your `QCEvaluation` objects now. Make sure to set the `QCMetric.reference` field of each metric to the returned uri `QCMetric.reference = uri` for that figure. Each URI is a unique hashed string that will allow the portal to recover your file. Make sure to include the `label` parameter or we won't be able to identify your filetype in the portal. + +Store all your `QCEvaluation` objects in a list. + +#### (5) Push metadata to DocDB + +Run the following code snippet. You can pass all your evaluations as a list or pass them one at a time: + +```{python} +session = boto3.Session() +credentials = session.get_credentials() +host = "api.allenneuraldynamics.org" + +auth = AWSRequestsAuth( +aws_access_key=credentials.access_key, +aws_secret_access_key=credentials.secret_key, +aws_token=credentials.token, +aws_host="api.allenneuraldynamics.org", +aws_region='us-west-2', +aws_service='execute-api' +) +url = f"https://{host}/v1/add_qc_evaluation" +post_request_content = {"data_asset_id": docdb_id, + "qc_evaluation": qc_eval.model_dump(mode='json')} +response = requests.post(url=url, auth=auth, + json=post_request_content) + +if response.status_code != 200: + print(response.status_code) + print(response.text) +``` + +If you get errors, contact Dan for help debugging. ### Reference/Figure recommendations @@ -145,11 +205,15 @@ You can use gifs (<10 MB) or mp4 files (<100 MB). Make sure your mp4 files are * #### Neuroglancer -You can set the reference directly to a neuroglancer link, they will open embedded in the portal in a way that can be easily changed to fullscreen. +You can set the reference directly to a neuroglancer link, they will open embedded in the portal and can be easily switched to fullscreen mode. #### Rerun -Rerun files (.rrd) can be linked in the reference, they will open in the rerun app embedded in the portal. +Rerun files (.rrd) can be linked in the reference, they will open in the rerun app embedded in the portal and can be easily switched to fullscreen mode. + +#### Other + +We're prepared to support basically any kind of browser-displayable object. Reach out with ideas. ## Development