Skip to content

DAAC Data Subscriber Script Usage

Hyun Lee edited this page Apr 27, 2022 · 2 revisions

Location

$HOME/mozart/ops/opera-sds-pcm/data_subscriber/daac_data_subscriber.py

Dependencies

Your $HOME/.netrc file should contain this snippet formatted with your credentials:

machine urs.earthdata.nasa.gov

login YOUR_USERNAME_HERE

password YOUR_PASSWORD_HERE

Additionally, you will need to create the data subscriber catalog elasticsearch index via this script: $HOME/mozart/ops/opera-pcm/data_subscriber/create_catalog.py

Help Dialog

(mozart) hysdsops@ip-100-104-40-96:~/mozart/ops/opera-pcm/data_subscriber$ python ./daac_data_subscriber.py --help

usage: daac_data_subscriber.py [-h] -c COLLECTION -s S3_BUCKET [-sd STARTDATE] [-ed ENDDATE] [-b BBOX] [-m MINUTES] [-e EXTENSION_LIST] [-v] [-p PROVIDER] [-i INDEX_MODE]

optional arguments:

  -h, --help            show this help message and exit

  -c COLLECTION, --collection-shortname COLLECTION

                        The collection shortname for which you want to retrieve data.

  -s S3_BUCKET, --s3bucket S3_BUCKET

                        The s3 bucket where data products will be downloaded.

  -sd STARTDATE, --start-date STARTDATE

                        The ISO date time after which data should be retrieved. For Example, --start-date 2021-01-14T00:00:00Z

  -ed ENDDATE, --end-date ENDDATE

                        The ISO date time before which data should be retrieved. For Example, --end-date 2021-01-14T00:00:00Z

  -b BBOX, --bounds BBOX

                        The bounding rectangle to filter result in. Format is W Longitude,S Latitude,E Longitude,N Latitude without spaces.

Due to an issue with parsing arguments, to use this command, please use the -b="-180,-90,180,90" syntax when calling from the command line. Default: "-180,-90,180,90".

  -m MINUTES, --minutes MINUTES

                        How far back in time, in minutes, should the script look for data. If running this script as a cron, this value should be equal to or greater than how often your cron runs (default: 60 minutes).

  -e EXTENSION_LIST, --extension_list EXTENSION_LIST

                        The file extension mapping of products to download (band/mask). Defaults to all .tif files.

  -v, --verbose         Verbose mode.

  -p PROVIDER, --provider PROVIDER

                        Specify a provider for collection search. Default is LPCLOUD.

  -i INDEX_MODE, --index-mode INDEX_MODE

                        -i "query" will execute the query and update the ES index without downloading files. -i "download" will download all files from the ES index marked as not yet downloaded.

Example Query

python3 daac_data_subscriber.py -m 60 -c HLSL30 -s opera-dev-isl-fwd-mplotkin -e L30

This will query the HLSS30 collection for files from the previous hour corresponding to L30 and upload the results to s3 bucket opera-dev-isl-fwd-mplotkin.

python3 daac_data_subscriber.py -sd 2022-03-27T01:00:00Z -ed 2022-03-27T02:00:00Z -c HLSL30 -s opera-dev-isl-fwd-mplotkin -e L30

This will query the HLSS30 collection for files from 2022-03-27T01:00:00 to 2022-03-27T02:00:00 to s3 bucket opera-dev-isl-fwd-mplotkin

Clone this wiki locally