Skip to content

Expected features of this tool

Chris Churas edited this page Jan 25, 2018 · 3 revisions

Objective:

Extract all CIL image data including video files from the OMERO server

SQL for retrieving video files:

select replace(image_id,'CIL_', '') as image_id from cil_data_type 
where is_public = true and is_video = true

SQL for retrieving image files:

select replace(image_id,'CIL_', '') as image_id, has_raw from cil_data_type 
where is_public = true and is_video = false

First do a fake hit on cell image library page

For example if downloading images or videos for 40580 use curl or lynx and hit the page:

http://www.cellimagelibrary.org/images/40580

How to download video files

wget http://www.cellimagelibrary.org/images/download_jpeg/10409.jpg

wget http://www.cellimagelibrary.org/videos/10409.flv

wget http://grackle.crbs.ucsd.edu:8080/OmeroWebService/images/10409.raw

Note: 10409.raw needs to be renamed based on its mime type. For example, rename 10409.raw to "10409.mpg" (based on extension derived from content type or internal file name. In addition take this renamed file and put it in a zip file with name id.zip ie 10409.zip

We should create a folder based on the image id. For example, a folder name, "10409" should contain "10409.mpg", "10409.flv", "10409.jpg", and "10409.zip"

In addition, the FLV file should be converted to mp4 and given a name with the format ID_web.mp4 and stored in json and into database. See issue #4.

How to download all other image files

NOTE: For each image there should be 3 files you download and 3 entries in database.

wget http://www.cellimagelibrary.org/images/download_jpeg/40580.jpg

http://grackle.crbs.ucsd.edu:8080/OmeroWebService/images/40580.tif

- Download the raw file only if the has_raw field is set to true:

http://grackle.crbs.ucsd.edu:8080/OmeroWebService/images/40580.raw

Note: 40580.raw should be renamed as zip file, "40580.zip".

In addition, extract image file from zip file and name it:

<ID>_orig.<FORMAT>

We should create a folder based on the image id. For example, a folder name, "40580" should contain "40580.jpg", "40580.tif", "40580.zip", and "40580_orig.tif"

NOTE: If the .raw aka .zip file has multiple entries make an _orig file for each entry

In addition use the ID.jpg file to derive 3 new thumbnail files. These files should be 512x512, 140x140, 88x88 and this should be customizable. The name of the files should be ID_thumbnailx<RES>. See issue #3

Checksum

We should do a MD5 checksum on all files

Technical challenges

  • The OMERO server restarts every several hours.
  • There might be corrupted data or missing data.

Report the download status to the database

insert into cil_download_status(id,image_id,is_video,file_name,download_success,
download_time,checksum,mime_type,num_of_bytes,checksum_value)
values(nextval('cil_downloader_seq'),2,false,'2.jpg',true, now(),true,'video/quicktime',1231242343,'96de7ce05d864f7a3dc43f074116e246' )