You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A given bowerbird source data set has an id, associated with a name, description, source URL/s, and possibly filters that specify which actual files or paths to keep or ignore.
In a simple situation, that id might be used to isolate a particular data set but a given set of files can contain more than one data set of interest, or include files not normally of interest (but occasionally important, for detailed usage or citation or checking purposes).
An id may be used to find the right paths in the file tree to explore, but it won't identify which files exactly to find - they might be multiple files extracted from an archive.
A concrete example, the id "10.5067/U8C09DWVX9LM" relates directly to the source URL "ftp://sidads.colorado.edu/pub/DATASETS/nsidc0081_nrt_nasateam_seaice/" which contains
sea ice concentration data. There are two separate data sets, one for the northern and one for the southern hemispheres. So our intention to find only the southern files does not match the single id. The actual location in the file system is
"./PUBLIC/raad/data/sidads.colorado.edu/" which is the address of the source with the "ftp://" part removed.
This tree includes paths like "DATASETS/nsidc0051_gsfc_nasateam_seaice/final-gsfc/north/daily/1978/north/" which contain the actual data, in ".bin" files - differentiated from
north and south by the "north/" or "south/". Another path "DATASETS/seaice/polar-stereo/tools/" includes auxiliarly grid and coordinate information about the grid itself in .msk or .dat files.
So, we can't have a clean relationship between the files for a data set and the source ID used by bowerbird. The bowerbird source is really a parent. What we could use that parent for is
the id is a parent, the "getter" for the files of interest
the source URLs require processing to be used for file identification (remove the prefix, same as is done during download)
a data set includes file filters, applied to the information from 1, and 2
Currently bowerbird is the source of 1,2 and raadfiles of 3. A data set has no identity atm beyond the name of the filename-getter (and its arguments). Then raadtools provides a read function that uses that filename-getter.
The text was updated successfully, but these errors were encountered:
some notes, there are several concepts in play
The text was updated successfully, but these errors were encountered: