-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identifier function false positives #12
Comments
We could look to see if a Co-ordinate Reference System (CRS) is defined, that should show whether it is spatial data or not. Something like:
|
Actually, the problem with this is that various 'geospatial' datasets actually have no CRS defined. This can be for various reasons, including lazy programmers (eg. some of my test algorithms don't propagate the CRS info between files properly), processing errors, a deliberate choice not to provide georeferencing information in the file itself (sometimes it is provided in a separate metadata file, for some unknown reason). Is it a particular issue if |
@robintw - if you have a random PNG file (say of a cat), then the main difference between the current RGB data factory and the geospatial one is that the names of the components will be Are there a limited number of extensions that are used for geospatial data, or are JPEG and PNG used for instance? I guess we just need to decide on the priority of the data factories - we could for instance give the generic RGB reader priority if and only if no metadata is present in the RGB file. But in this case, would you still want the components named |
Yes, we can probably do this based on extensions: satellite data are never (to my knowledge) in JPG or PNG. Some are, however, in JPEG2000 (extension I have no particular preferences about standard RGB data: probably Red, Green and Blue are better as names of components for them. How do we set the priorities for DataFactories? Is it a single static constant for each factory, or can it change as you get more information (eg. we try getting metadata using rasterio, if we can't find any then we decrease the relative priority of the geospatial reader, etc.). |
@robintw - the priority is set by an argument in the https://github.com/glue-viz/glue/blob/master/glue/core/data_factories/hdf5.py#L41 I hadn't thought of having the identifier return the priority - that would be even better, since it would allow more fine tuning as you suggest. |
At the moment, the
is_geospatial
function is recognizing pretty much any RGB image file:@robintw - I wonder whether there is some kind of meta-data we can look for that would identify files as being specifically geospatial data?
The text was updated successfully, but these errors were encountered: