-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linking images to metadata #5
Comments
We currently use "load_data_with_illum.csv" to get image and flatfield filepaths and well/site metadata. Right now I can't access the path (it's within .../source/workspace/load_data_csv) within any of the sources. I'd like to see if I can get access and make sure that the same file is present across sources to use it as a consistent input. |
@dmikeando That's great you're using
Can you verify that? |
In https://github.com/jump-cellpainting/datasets-private/issues/11#issuecomment-1304031422 we concluded that the embedder uses all these columns from
We now additionally include these two columns in the
What additional information do we need to link to cells? I think we can get everything else the embedder needs by querying the SQLite files (and storing it as a parquet file). This is essentially what DeepProfiler does too. backend_file=/Users/shsingh/work/projects/2015_Bray_GigaScience/workspace/backend/CDRP/25738/25738.sqlite
sqlite3 -header -csv ${backend_file} "select "select Image.Image_Metadata_Plate as Metadata_Plate,Image.Image_Metadata_Well as Metadata_Well,Image.Image_Metadata_Site as Metadata_Site,Nuclei.ObjectNumber,Nuclei.Nuclei_Location_Center_X,Nuclei.Nuclei_Location_Center_Y from Nuclei inner join Image on Nuclei.ImageNumber=Image.ImageNumber and Nuclei.TableNumber=Image.TableNumber limit 10"
Rendered as a table:
So we can join this parquet file (that we'd create using the query above) with the In other words, if we create a per-plate parquet file (maybe sharded across wells) with these columns, one per cell, that's all you really need?
|
Thanks @shntnu . Your analysis looks correct to me. As we discussed, some of the columns (e.g. the illum filepaths/names) will be very repetitive, so using a dictionary/enum/categorical type could save on disk space and load time. https://arrow.apache.org/docs/python/data.html#dictionary-arrays |
Ah yes, I'm thinking we'd not actually save out the join with the load data but rather do that join on the fly. We'd only save out the Image-Nuclei join, and just those 6 columns I've listed below. Sorry if this is not clear (on my phone)
(Still, would be good to use enum for the first 3) |
This is now being addressed in cytomining/pycytominer#257 |
Let's use this to discuss how we can link images to metadata
@dmikeando Presumably all that you are using as input right now is the images, but no other information about them.
To help you get started on how to link images to metadata, can you clarify how you get Source, Plate, Batch, Well, Site information from the images? Presumably from their paths?
The text was updated successfully, but these errors were encountered: