You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with biocache-store v2.4.1 within bioatlas/ala-docker. While building an alpha site, to bootstrap the db, attempted to load a GBIF download having ~4.3 million records. In v2.4.1, 'biocache load drxx' appeared to hang after retrieving the zip file from the collectory and unzipping. Looking at /data/biocache-load/drxx, the pre-processing step that creates eg. occurrence.txt-sorted was taking a long time - 103 minutes.
I reverted to biocache-store v2.2 within the same bioatlas/ala-docker system. In that case, the same call to 'biocache load drxx' completed the pre-process sorting in 3 minutes.
I believe that the configuration parameters are the same for both, so the difference appears to be the released version.
This is possibly a performance regression caused by a fix upstream to using safe CSV sorting rather than the previous unsafe method of hoping that CSV files never contain quoted new-line characters and using the unsafe GNU coreutil sort program.
I am working with biocache-store v2.4.1 within bioatlas/ala-docker. While building an alpha site, to bootstrap the db, attempted to load a GBIF download having ~4.3 million records. In v2.4.1, 'biocache load drxx' appeared to hang after retrieving the zip file from the collectory and unzipping. Looking at /data/biocache-load/drxx, the pre-processing step that creates eg. occurrence.txt-sorted was taking a long time - 103 minutes.
I reverted to biocache-store v2.2 within the same bioatlas/ala-docker system. In that case, the same call to 'biocache load drxx' completed the pre-process sorting in 3 minutes.
I believe that the configuration parameters are the same for both, so the difference appears to be the released version.
See attached file.
2.2-vs-2.4.1-biocache-load-dr7-gbif-download.txt
The text was updated successfully, but these errors were encountered: