Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for new fields to index and expose in search and download #666

Open
2 of 42 tasks
MortenHofft opened this issue Feb 10, 2022 · 6 comments
Open
2 of 42 tasks
Assignees

Comments

@MortenHofft
Copy link
Member

MortenHofft commented Feb 10, 2022

There is quite a lot of ideas for fields to add to the index. So that users can search and download by adding these new filters.
I've collected them all here (at least the ones I could find)

From #400

  • earliestEonOrLowestEonothem
  • latestEonOrHighestEonothem
  • earliestEraOrLowestErathem
  • latestEraOrHighestErathem
  • earliestPeriodOrLowestSystem
  • latestPeriodOrHighestSystem
  • earliestEpochOrLowestSeries
  • latestEpochOrHighestSeries
  • earliestAgeOrLowestStage
  • latestAgeOrHighestStage
  • lowestBiostratigraphicZone
  • highestBiostratigraphicZone
  • group
  • formation
  • member
  • bed

From #7

  • acceptedKey

#182

  • verbatimTaxonKey

From #425

  • gbifRegion
  • publishedByGbifRegion

From #515

From #662

  • datasetName has since beed aded
  • datasetID has since beed aded

From #664

  • otherCatalogNumbers

#666 (comment)

  • taxonConceptID

From #515 (comment)

  • isSequenced

Added 31 oct 2024: From gbif/portal-feedback#5541

  • Extended Measurement Or Facts:measurementType
  • DNA derived data:DNA_sequence
  • GGBN Permit Extension:permitType
  • Darwin Core Resource Relationship:relationshipOfResource
  • Taxon Description:type

Added 31 oct 2024 from #1099

  • dnaSequenceID
@MortenHofft MortenHofft changed the title Request for new fields to index and expose in API and downloads Request for new fields to index and expose in search and download Feb 10, 2022
@timrobertson100
Copy link
Member

Thanks for collating this. We could take the approach of responding to requests as they come in - which is not to be discounted for sure - but perhaps we might just consider what it would take to index everything? Most of the fields will be incredibly sparsely populated, and I'm not sure our original concerns a decade ago of blowing index sizes would hold true today.

@CecSve
Copy link

CecSve commented Apr 13, 2022

First of all, let me know if this is not the right issue to address this in.

Second, the following dataset has changed the DwC terms recently between versions 1.8 to 1.9, shifting from using 'individualCount' to 'organismQuantity' + 'organismQuantityType': https://www.gbif.org/dataset/91fa1a0d-a208-40aa-8a6e-f2c0beb9b253. When a user downloads the simple version of the dataset, they only get the column 'individualCount' populated with NAs and do not get the updated information stored in 'organismQuantity' + 'organismQuantityType'. Is there a way to secure that data is not 'lost' when publishers begin using new terms for the same thing?

@mdoering
Copy link
Member

taxonConceptID would be another one to search for, e.g. avibase ids or taxonid.org identifier:
https://www.gbif.org/occurrence/3457928716

@CecSve
Copy link

CecSve commented Jan 27, 2023

  • sex (currently indexed as keywords)

We are working on a controlled vocabulary gbif/vocabulary#83 but it is not finalized yet.

@muttcg muttcg self-assigned this Nov 30, 2023
muttcg added a commit that referenced this issue Dec 5, 2023
muttcg added a commit that referenced this issue Dec 5, 2023
@MortenHofft
Copy link
Member Author

MortenHofft commented Oct 31, 2024

This issue hasn't been closed so I assume at least some of above is still not implemented. I also just added more fields and a reference to the original issue.

@muttcg
Copy link
Member

muttcg commented Oct 31, 2024

@MortenHofft
Many of original are closed, as far as I remember some were need discussion. I have a list fields are ready (probably group_ is not relevant anymore, because of SQL downloads):

Fields Index Hive API Field API Search Download field Smal filtered download Big downloads Result
earliestEonOrLowestEonothem TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
latestEonOrHighestEonothem TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
earliestEraOrLowestErathem TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
latestEraOrHighestErathem TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
earliestPeriodOrLowestSystem TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
latestPeriodOrHighestSystem TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
earliestEpochOrLowestSeries TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
latestEpochOrHighestSeries TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
earliestAgeOrLowestStage TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
latestAgeOrHighestStage TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
lowestBiostratigraphicZone TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
highestBiostratigraphicZone TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
group TRUE TRUE TRUE TRUE TRUE TRUE TRUE hive reserved word group_
formation TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
member TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
bed TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
gbifRegion TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
publishedByGbifRegion TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
fieldNumber TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
preparations TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
sex TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
startDayOfYear TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
endDayOfYear TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
higherGeography TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
island TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
islandGroup TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
georeferencedBy TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
previousIdentifications TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
datasetName TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
datasetID TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
otherCatalogNumbers TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
taxonConceptID TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
isSequenced TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK
associatedSequences TRUE TRUE TRUE TRUE TRUE TRUE TRUE OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants