Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GBIF Repatriated Datasets #907

Open
sadeghim opened this issue May 24, 2023 · 7 comments
Open

GBIF Repatriated Datasets #907

sadeghim opened this issue May 24, 2023 · 7 comments
Assignees

Comments

@sadeghim
Copy link
Member

Run the GBIF Repatriated Datasets job in collectory-test.

Current GBIF data resources in collectory-test

Arizona State University Lichen Herbarium
Auckland Museum Botany Collection
Auckland Museum Land Vertebrates Collection
BMSM Bailey-Matthews National Shell Museum
Bee Biology and Systematics Laboratory
Bell Museum lichens
Bernice P. Bishop Museum
CONN
Diversity of the Indo-Pacific (DIPnet)
Duke University Herbarium Lichen Collection
EOD – eBird Observation Dataset
Entomological Collections (NHRS), Swedish Museum of Natural History (NRM)
Essig Museum of Entomology
Fishbase
Fungal Biodiversity Centre (CBS) - Fungi strains
GBIF test - UNITE's INSDC sequence data
Geneva Herbarium – General Collection (G)
Global Lacustrine Diatoms
Harvard University Herbaria: All Records
Hexacorallians of the world
INSDC Environment Sample Sequences
INSDC Host Organism Sequences
INSDC Sequences
Ichthyology Collection - Royal Ontario Museum
International Fossil Shell Museum (NL) - Mollusca Collection
LACM Entomology Collection
LACM Malacology
LACM Vertebrate Collection
Lichens at Herbarium Berolinense, Berlin (B)
Lund Botanical Museum (LD)
Lund Museum of Zoology - Insect collections (MZLU)
Lyman Entomological Museum (LEMQ)
MAL
MBM - Herbário do Museu Botânico Municipal
MVZ Bird Collection (Arctos)
MVZ Herp Collection (Arctos)
MVZ Mammal Collection (Arctos)
Macaulay Library Audio and Video Collection
Manchester Museum, University of Manchester, Botany Collection
Meise Botanic Garden Herbarium (BR)
Michigan State University Herbarium Lichens
Mosquito Occurrence Dataset
Natural History Museum Rotterdam - Specimens
Natural History Museum, Vienna - Herbarium W
Non-vertebrate Paleontology, Jackson School Museum of Earth History, University of Texas at Austin
Ornithology Collection Passeriformes - Royal Ontario Museum
Paleobiology Database
Planetary Biodiversity Inventory Eumycetozoan Databank
RSA - Rancho Santa Ana Botanic Garden Herbarium
Royal Botanic Garden Edinburgh Herbarium (E)
Royal Botanic Gardens, Kew - Economic Botany Collection Specimens
SCAR Biogeographic Atlas of the Southern Ocean - Porifera - Data
SP - Herbário do Estado "Maria Eneyda P. Kaufmann Fidalgo" - Coleção de Fanerógamas
Texas A&M University Insect Collection
Texas Tech University - Invertebrate Zoology
The Lichens Collection at the Botanische Staatssammlung München
UMNH Reptiles and Amphibians Collection (Arctos)
UTEP Plants (Arctos)
UWBM Ornithology Collection
United Herbaria of the University and ETH Zurich
United States National Plant Germplasm System Collection
University of British Columbia Herbarium (UBC) - Algae Collection
University of British Columbia Herbarium (UBC) - Bryophytes Collection
University of British Columbia Herbarium (UBC) - Vascular Plant Collection
University of California Museum of Paleontology
University of Michigan Herbarium
University of Michigan Museum of Zoology, Division of Birds
University of Michigan Museum of Zoology, Division of Mollusks
University of Michigan Museum of Zoology, Division of Reptiles & Amphibians
University of Tennessee Bryophyte Herbarium
University of Vermont, Pringle Herbarium
Vulnerable marine ecosystems in the South Pacific Ocean region
WFVZ Bird Collections
Xeno-canto - Bird sounds from around the world
agent actions test
agents-4

Identify missing collections and see how we can add them.

@peggynewman
Copy link
Contributor

Go to: https://collections-test.ala.org.au/admin
Repatriate datasets
max 5000 datasets
min 1000 records
max 10m records
review, deselect bad ones and the ones we already have
Hit load

@peggynewman
Copy link
Contributor

Doesn't work, raised an issue on collectory

@peggynewman
Copy link
Contributor

Fixed, waiting for me to retry

@peggynewman
Copy link
Contributor

peggynewman commented Dec 16, 2024

Giving up on collectory-test, and now running in prod.

Rose, thanks for loading:
dr22496 - 126,223 records
dr22497 - 177,263 records
dr22501 - 122,982 records
dr22525 - 159,518 records

Now can you load these?
(Params I set = 50 count, 10000-5m records)
dr22305
dr22496
dr22497
dr22499

dr22505
dr22506
dr22507
dr22508
dr22509

dr22514
dr22515
dr22516
dr22517
dr22518
dr22519
dr22520
dr22521

@rosemaryjoconnor
Copy link
Contributor

rosemaryjoconnor commented Dec 17, 2024

17/12/2024

Prod

  • All loaded and ingested successfully
  • dr22505 and dr22521 - no indication of how many UUIDs attempted in either stderr or stdout.
  • Checked dwca-imports and dwca created with occurrence records:
    • dr22505 - 27,537
    • dr22521 - 15,157
DR Load Dataset Ingest # Records
dr22305 Success Success 117,146
dr22496 Success Success 126,223
dr22497 Success Success 177,263
dr22499 Success Success 74,468
dr22505 Success Success 37,537
dr22506 Success Success 55,584
dr22507 Success Success 46,317
dr22508 Success Success 40,280
dr22509 Success Success 34,161
dr22514 Success Success 38,726
dr22515 Success Success 25,294
dr22516 Success Success 15,657
dr22517 Success Success 32,623
dr22518 Success Success 20,506
dr22519 Success Success 15,655
dr22520 Success Success 21,661
dr22521 Success Success 15,175

Total new records: 894,276

@peggynewman
Copy link
Contributor

peggynewman commented Dec 19, 2024

Working on these now:

druid dataset title
dr22498 INSDC Sequences
dr22500 NMNH Extant Specimen Records (USNM, US)
dr22501 Natural History Museum (London) Collection Specimens
dr22502 Naturalis Biodiversity Center (NL) - Botany
dr22522 Naturalis Biodiversity Center (NL) - Mollusca
  Neptune Deep-Sea Microfossil Occurrence Database
  Observation.org, Nature data from around the World
  Paleobiology Database
  Phanerogamic Botanical Collections (S)
  Pl@ntNet automatically identified occurrences
  Queensland Marine Sediment
  Snow Entomological Museum Collection
  The New York Botanical Garden Herbarium (NY)
  The Retrospective Analysis of Antarctic Tracking (Standardised) Data from the Scientific Committee on Antarctic Research
  Tropicos Specimens Non-MO
  UF Invertebrate Zoology
  Vulnerable marine ecosystems in the South Pacific Ocean region
druid No connection parameters
dr29607 EURISCO, The European Genetic Resources Search Catalogue
dr29608 Field Museum of Natural History (Zoology) Invertebrate Collection

@peggynewman
Copy link
Contributor

peggynewman commented Dec 20, 2024

Raised an issue for some problems I've come across:
AtlasOfLivingAustralia/collectory#234

Watch out for new runs creating duplicate data resources
Some data resources have a zip sitting in connection parameters but for some reason the load didn't work

iBOL would be great

Dataset listing: https://collections.ala.org.au/datasets#filters=contentTypes%3Agbif%20import

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants