- v5.3.7 Release notes
- v5.3.6 Release notes
- v5.2.20 Release notes
- v5.2.19 Release notes
- v5.2.18 Release notes
- What’s Changed
- v5.2.16 Release notes
- v5.2.15 Release notes
- v5.2.14 Release notes
- v5.2.13 Release notes
- v5.2.12 Release notes
- v5.2.11 Release notes
- v5.2.10 Release notes
- v5.2.9 Release notes
- v5.2.8 Release notes
- v5.2.7 Release notes
- v5.2.6 Release notes
- v5.2.5 Release notes
- v5.2.4 Release notes
- v5.2.3 Release notes
- v5.2.2 Release notes
- v5.2.1 Release notes (v5.1.15 and v5.2.0 included)
- v5.1.14 Release notes
- v5.1.13 Release notes
- v5.1.12 Release notes
- v5.1.11 Release notes
- v5.1.10 Release notes
- v5.1.6 Release notes (v5.1.5 included)
- v5.1.4 Release notes
- v5.1.3 Release notes
- v5.1.2 Release notes
- v5.1.1 Release notes (v5.1.0 included)
- v5.0.7 Release notes
- v5.0.6 Release notes
- v5.0.5 Release notes
- v5.0.4 Release notes
- v5.0.3 Release notes
- v5.0.2 Release notes
- v5.0.1 Release notes
- v5.0.0 Release notes
- v4.2.7 Release notes
- v4.2.6 Release notes
- v4.2.5 Release notes
- v4.2.3 Release notes
- v4.2.2 Release notes
- v4.2.1 Release Notes
- 4.2.0
- 4.1.15
- 4.1.14
- 4.1.13
- 4.1.12
- 4.1.11
- 4.1.10
- 4.1.9
- 4.1.8
- 4.1.7
- 4.1.6
- 4.1.5
- 4.1.4
- 4.1.3
- 4.1.2
- 4.1.1
- 4.1.0
- 4.0.7
- 4.0.6
- 4.0.5
- 4.0.4
- 4.0.3
- 4.0.2
- 4.0.1
- 4.0.0
This pages contains links to release notes for BioSamples for version 4.0.0 and higher. This release represents a comprehensive overhaul and therefore previous release notes are no longer applicable.
-
MICROBE logo in front page
-
NCBI and ENA sample mirroring fixes
-
Option added in BioSamples to perform JSON schema validation of all WEBIN samples
-
Fixes in documentation template
-
Fixes in handling NCBI sample mirroring in BioSamples
-
add public filter for INSDC status != suppressed by @theisuru in #630
-
Bsd 2292 taxon importer codon by @theisuru in #629
-
fix release date/ sample status bug in accessioning of V1 and V2, sam… by @dipayan1985 in #631
-
Adding sample post release action to CI/CD by @dipayan1985 in #632
-
Fix Gitlab CI file for post release action pipeline by @dipayan1985 in #633
-
Fix CI/CD for sample post release action by @dipayan1985 in #634
-
Too large artifact error in CICD by @dipayan1985 in #635
-
Configure micrometer stackdriver by @dipayan1985 in #620
-
Stackdriver monitoring by @dipayan1985 in #636
-
BSD release 5.2.17 by @dipayan1985 in #637
-
Elixir biovalidator upgrade
Elixir biovalidator upgraded to the latest version. This version includes improved performance and error handling.
-
EVA logo in external links
Now EVA links will be identified and shown in the external links section with the logo.
=== Bug fixes
-
At sample submission time if AAP domain is not provided, the first AAP domain of the user is used by default.
-
File Upload Submissions
Case insensitive column names are now accepted for file uploader submissions
Improvement in error reporting for failed submissions. Errors related to authentication, access and file format problems are clearly reported back to the submitter
-
Performance improvement in accessioning There were several request timeouts observed over the past few months when ENA attempted to create around ~1000 sample accessions from BioSamples in a single API call. There were several bottlenecks identified in the BioSamples accession generation process and they are eliminated now and replaced with a much simpler process resulting in improved accessioning and submission performance. An example from May 4, 2022 taken from ENA logs shows BioSamples is now able to generate ~10000 accessions in one single API call:
“Requested 9985 accessions from BioSamples and registering 9985 new BioSample accessions took 81229 milliseconds”
-
Reference to private BioSamples while doing an ENA (WEBIN) submission
It is now possible to create private samples in BioSamples and refer to the BioSamples accessions while doing an ENA submission. It has also been ensured that such private samples in BioSamples will be made automatically public when runs/analyses that refers to these samples are made public in ENA
-
Generic structured data model
BioSamples structured data submission was restricted to only a few structured data types, like AMR, histological data, etc. Now with this release it is possible to submit any type of structured data to BioSamples.
-
structured data are specific additional information to a sample for example, antibiogram data which is an overall profile of antimicrobial susceptibility testing results
-
-
Improved NCBI and ENA sample imports
BioSamples import pipelines import newly created or updated samples from ENA and NCBI daily to remain consistent with other INSDC databases. Publication information of samples were not imported by BioSamples until now and we are starting to do that from this release. ENA Browser is in their final phase of testing before they start indexing samples from BioSamples and this feature has been requested by them as they would rightly like to query a single database to get all related information for a sample
-
Submissions with WEBIN authentication
Submitters are no longer required to pass the parameter authProvider=WEBIN for submissions done with WEBIN authentication.
-
Filtered search for both samples and accessions containing a mix private and public samples were returning inconsistent results; this has now been fixed
-
Solr out of memory issues has been resolved which ensures consistency in searching and filtering
-
Structured data PUT and GET endpoints
1.1 PUT to add structured data to already submitted sample: PUT structureddata/<accession>
1.2 GET to fetch structured data of a sample GET structureddata/<accession>
Documentation and example available here
-
File upload submissions: We had a restriction of unique sample names per submitter for file uploader submissions. This has now been removed as we have received requests that for re-sequencing project multiple samples having the same sample name can be submitted by same submitter or community. The sample metadata should be different.
-
Improved error handling for the file uploader submissions with more user friendly error messages
-
Allowing case insensitive column names for the file uploader submission files
-
Improvement of handling structured data in BioSamples
-
Performance improvements in accessioning
-
Improvements in the ENA import pipeline
-
New pipeline added to handle sample release in BioSamples when ENA data (runs/analyses) referring such samples are released
-
Private sample search using Webin Authentication
Private sample submitted using Webin Authentication can now be searched using the GET API, including:
1.1. Single private sample search using accession
1.2. Filtered search result containing only private samples
1.3. Filtered search results containing a mix of private and public samples
Example API calls,
For 1.1 - curl 'https://www.ebi.ac.uk/biosamples/samples/<accession>?authProvider=WEBIN' -i -X GET -H "Content-Type: application/json;charset=UTF-8" -H "Accept: application/hal+json" -H "Authorization: Bearer $TOKEN"
For 1.2 and 1.3 - curl 'http:// www.ebi.ac.uk/biosamples/samples?filter=attr:<attribute_name>:<attribute_value>&authProvider=WEBIN' -i -X GET -H "Content-Type: application/json;charset=UTF-8" -H "Accept: application/hal+json" -H "Authorization: Bearer $TOKEN"
To get the Webin Authentication token,
TOKEN=$(curl --location --request POST 'https://www.ebi.ac.uk/ena/submit/webin/auth/token' --header 'Content-Type: application/json' --data-raw '{ "authRealms": [ "ENA" ], "password": "<password_here>", "username": "<username_here>" }')
-
Additional field support for the drag’n’drop uploader
Publications, contacts and organizations can now be added to sample metadata for submission using the drag’n’drop uploader. For more details please refer to https://www.ebi.ac.uk/biosamples/docs/cookbook/upload_files.html
-
Generic structured data submission model
We have refactored the structured data API to accept generic data structures. This alleviates the need to update the code as new datatypes were requested. Additionally, the structured data section of the sample now has its own owner, allowing it to fully support cases where structured data is added separately to the original samples metadata.
More details about the API can be found in our documentation here https://www.ebi.ac.uk/biosamples/docs/references/api/submit#_submit_structured_data_to_sample.
-
BioSamples API documentation has been fixed to include the http requests and http response snippets
New submission and accession endpoints will be deployed with this release to increase availability, in particular for bulk accessioning. Those will be first integrated with ENA for accessioning and monitored for performance. Metrics will be documented and made available for users. Expected date of general availability is December 10, 2021, if performance results are as expected. Target availability is 99.5%.
File uploader improvements
BioSamples file uploader has gone through some changes for dealing with larger file uploads. Any file upload with over 200 samples are queued and submitters are provided with a submission ID. Submitters can use the submission ID in the View submissions tab and check status of their uploads. Once a submission is searched in the View Submissions tab and if the submission ID is valid then the submitter will get a result json file with the submission status and the sample accessions mapped against sample names.
Submissions can have either of the 3 status, ACTIVE, COMPLETED or FAILED.
ACTIVE status: Submission is waiting to be processed or is being processed
COMPLETED status: Submission has completed, if a submission is in COMPLETED status, it is expected that the samples have been created and accessions generated OR samples have failed validation against minimal validation rules of BioSamples database or samples have failed validation against checklist specified by the submitter while doing the file upload
FAILED status: Submission has failed, the submission might have a failed status of the file uploaded was invalid and BioSamples were not able to parse the file or any technical issue in BioSamples database which has prevented the submission from getting processed
JSON schema-store integration and BioSamples checklist (BSDC) ID space
BioSamples now has a checklist ID space starting from BSDC00001. This is to clearly distinguish between ENA checklists and BioSamples checklists. We have also imported ENA checklists into BioSamples schema-store preserving ENA checklist IDs. ‘checklist’ attribute in the sample will trigger a validation in sample submission time, where the checklist will be retrieved from the schema-store and validated using the Elixir biovalidator.
-
ENA import pipeline fix for BioSamples authority samples
This bugfix release is to ensure that BioSamples authority samples i.e. samples submitted to BioSamples and referred in an ENA submission is not re-updated with Webin submission account Id while attaching SRA accession to the sample. Updating the sample with SRA accession is a requirement of the ENA browser.
-
JSON Schema store integration with BioSamples
We have integrated the JSON Schema store with BioSamples. JSON Schema store is an application for storing and managing JSON Schemas. All BioSamples’ checklists will be stored and managed in the JSON Schema store. In the future we plan to expose the API with authentication.
-
BioSamples File uploader
We have introduced a new drag and drop style file uploader for bulk uploading of samples. This is mostly intended for our non-programmatic submitters who want to fill in their samples metadata in a file for uploading and persisting samples in BioSamples. The drag and drop uploader in BioSamples supports both Webin and AAP authentication. More details on the uploader can be found in a newly added uploader guide. The guide has the required details about the file format, mandatory fields and other pre-conditions. [add link]
-
ENA taxonomy service integration with BioSamples
Samples submitted to BioSamples using ENA Webin authentication are put through additional checks to be compliant with ENA. All ENA samples must have taxonomy information and the taxonomy must be valid against the ENA taxonomy service. In BioSamples we have added a submission time validation of the mandatory organism attribute against the ENA taxonomy service.
-
BioSamples client changes
BioSamples client version 5.1.0 has undergone technical changes to support Webin authentication. The latest version of the client can be used to submit samples, curate samples or certify samples in BioSamples using Webin authentication.
-
Improved DUO code rendering
Improve DUO codes in Samples page. When the mouse pointer is moved on top of a DUO code, its description will be displayed as a tooltip.
-
Re-introduce missing
samples/validate
endpointIn last release we have removed
samples/validate
endpoint in favour ofvalidate
endpoint. But since most users are usingsamples/validate
we will keep this and deprecate in a future release. -
Support both json and hal+json for accept header
Validate endpoint did not support
hal+json
accept
header in last release. We will include support for this. -
Enable ENA to pre-accession samples using WEBIN authentication instead of AAP
ENA will pre-accession samples using a WEBIN super user (prefixed SU-) and the metadata submission will be done by a non super user. During metadata submission we will check if the sample has been accessioned by the ENA registered super user and if yes then we will allow submission by any general webin user who wants to submit metadata against the accession.
-
Authentication
We have added additional authentication support in BioSamples. With this release BioSamples users can authenticate using EBML-EBI’s European Nucleotide Archive (ENA) WEBIN authentication service. This is especially useful for users who intend to submit their sample metadata to BioSamples and sequencing data to ENA as shared, identical WEBIN credentials can be used to submit to both BioSamples and ENA. BioSamples continues to support the existing AAP authentication mechanism. AAP authentication is the default mode and current users using AAP authentication to submit sample metadata to BioSamples are not required to do any changes to their submission routines. More information related to authentication could be found here.
-
Sample search results bulk download
A new API enables downloading searching and bulk downloading results up to a maximum of 100,000 samples. The API supports text search and samples filtering. When search results exceed the maximum allowable download size, only the first 100,000 samples will be downloaded. Download buttons were also added to the search user interface. Currently this supports downloading samples as JSON, XML or accession list only.
-
Validation checklist in samples body (similar to existing ENA checklists)
Samples are validated at submission time. They are by default validated against the biosamples-minimal (ERC100001) checklist. Users can additionally provide the name of a known checklist in the sample body; when provided, this is also used for validation. If validation fails, the submission will be rejected. This enables users to define their preferred validation checklist at submission time. Please refer to the validation guide to see available checklists. The validation API is also available independently of submission and can be used to validate samples without submitting. We have updated our documentation to reflect these changes in certification and validation.
-
Link to new ENA browser - Samples having external reference to ENA were using the old ENA browser links. This has now been updated to link to the new ENA browser.
-
Private samples are searchable by authenticated users
Previously, private samples were only available for direct retrieval after logging in. This release enables searching of private samples through the API by their owner. The sample search endpoint requires a JWT and returns the private samples the user is authorised for.
-
Add Plant-MIAPPE checklist to BioSamples' schemas
We have added Plant-MIAPPE checklist into BioSamples' schemas. At the sample submission time, certification service will verify if the given sample is in compliance with this checklist. If compliant, Plant-MIAPPE compliant certificate will be attached to the sample. Please find more about certification and validation in our documentation here.
-
Remove holiday notification banner from the website
-
Further changes in representation of BioSamples dates
1.1 In response to additional user feedback, a few changes in how we present dates in the BioSamples user interface have been implemented. The “ID created date” was removed from the user interface. This internal bookkeeping date was generating confusion with the sample submission date. More information is available at https://wwwdev.ebi.ac.uk/biosamples/docs/faq#_why_was_the_code_id_created_on_code_field_removed
1.2 A collapsible section “BioSamples record history” has been added and contains the following dates: Submitted on: The earliest date at which valid metadata has been provided by the submitter. This attribute is generated by BioSamples and other INSDC partners.
Released on: The user-supplied date at which the sample metadata is made available publicly for the first time.
Last reviewed: The date at which a new curation object has been created or the automatic curation pipelines have been run on a sample metadata. This field is only present if at least one curation object has been added by the curation pipelines. The last reviewed date is updated when the curation objects are reviewed, even if they are found still valid and are not modified and indicates that the sample is compliant with the latest BioSamples curation rules [https://www.ebi.ac.uk/biosamples/docs/guides/curation]. This attribute is generated by BioSamples.
Please refer to our documentation and FAQ section for further details, at https://www.ebi.ac.uk/biosamples/docs/guides/dates and https://wwwdev.ebi.ac.uk/biosamples/docs/faq
-
Modification to EBI search engine export pipeline
The “host” attribute is now represented as “host scientific name” in the daily sample export. This change has been done to accommodate a request from the EBI Search team around a new facet in EBI search.
-
Change in representation of BioSamples dates In response to user feedback, and to alleviate possible confusion between samples ID creation and submission dates, we have updated the label of ‘created on’ to ‘ID created on’, and added the ‘Submitted on’ date for newly added samples. We also added documentation for all the following dates which will be displayed in the UI going forward:
-
ID created on: The date at which the sample accession is created. This attribute is generated by BioSamples. IDs can be created in advance of collection or submission; BioSamples allows the pre-registration of sample accession to support cross-archive data exchange and data provenance management.
-
Submitted on: The earliest date at which valid metadata has been provided by the submitter. This attribute is generated by BioSamples and other INSDC partners.
-
Released on: The user-supplied date at which the sample metadata is made available publicly for the first time.
-
Updated on: The last date at which the sample was updated. Samples can be updated for curation needs and other technical purposes. More information about curation is available in the documentation [https://www.ebi.ac.uk/biosamples/docs/guides/curation. ] This attribute is generated by BioSamples.
-
-
Organism has been made a mandatory attribute for samples Samples submitted to BioSamples must have either an organism attribute or a species attribute. Samples without an organism and species will not be persisted and the request of submission will be rejected with HTTP status code 400 (Bad request)
-
Certification Service A new service has been added to BioSamples for sample validation using JSON schema checklists. Samples validated against checklists are deemed certified by the checklist and certificates are added to the sample. Please see BioSamples user guide and API guide on the certification service for more details: User guide - http://www.ebi.ac.uk/biosamples/docs/guides/certification API reference - http://www.ebi.ac.uk/biosamples/docs/references/api/certify First use case - Certification service has been used to validate the existence of organism or species in sample metadata submitted to BioSamples. Schema reference - https://github.com/EBIBioSamples/biosamples-v4/blob/dev/webapps/core/src/main/resources/schemas/certification/biosamples-minimal.json
-
Structured data support for new types Structured data support was extended to include new data formats. New data formats include CHICKEN_DATA, HISTOLOGY_MARKERS, MOLECULAR_MARKERS and FATTY_ACIDS. This has been done for the structured data support of the ‘HoloFood’ project involving the microbiome of agricultural animals (salmon and chicken). As part of this project, various submitters are going to generate the data and some of which is suitable to go into ENA. Some of the data in structured data form falls outside ENA’s remit (eg, histological summaries for the samples, etc) and BioSamples will provide support to store such structured data.
-
Sample recommendations endpoint New endpoint introduced to use along with validation endpoint. Before submitting a sample, the submitter can check if the sample conforms to the BioSamples recommended format and get suggestions for changes. Submitting a sample in recommended format will increase FAIRness of data. Please refer to the API guide for more details - http://www.ebi.ac.uk/biosamples/docs/references/api/validate
-
Relationship curations Previously, curations can only be applied for attributes and external references. Now curations can also be applied to relationships. This enables third parties to apply relationships to samples.
-
Retrospective KILLED samples handler added to the ENA pipeline The ENA import pipeline that imports samples from ENA to BioSamples has been modified to retrospectively check if samples have been KILLED in ENA. Status update is made accordingly in BioSamples so that sample metadata is consistent with ENA.
-
Cross-origin resource sharing (CORS) has been enabled for BioSamples API’s for all origins and all methods
-
BioSamples sample XML view has been modified to include AMR Antibiogram model as well. Please download the XML from the example sample - https://wwwdev.ebi.ac.uk/biosamples/samples/SAMN09711403 to see the XML modelling of AMR data
-
Bug fix in EBI search pipeline to not include killed and suppressed samples in the exported data
-
Bug fix in NCBI samples to avoid 400 bad requests while processing samples that don’t have an organism. Certification service rejects samples without an organism
-
Bug fix in pipelines to deal with HTTP 404 errors while trying to fetch samples with blank curation domain. Pipeline failure avoided in such cases and error logging is improved
-
The EBI search data export pipeline has been modified so that the data export dump includes the top 100 most present attributes in all samples in the BioSamples database. Other attributes have been ignored in sample metadata sent to the EBI search engine. This has been done because the EBI search engine can permit upto 100 query params and not more
-
Retiring SampleTab API
The SampleTab, legacy-json and legacy-xml APIs have been retired in this release. Please contact us at [email protected] if you have any questions/concerns. The following endpoints are no longer supported:
-
Sample groups API:
Sample group API, which was present in SampleTab is now present in JSON API. But we are in discussion whether there is a real user requirement for this. We will be really happy to hear from users, if they have any use case in mind for sample groups. -
Sample graph search API, interface and new neo4j dependency:
Sample graph search is an experimental feature, which enables to explore sample to sample and sample to external resource relationships. This is backed by neo4j graph database and therefore now neo4j is introduced as a new dependency. Experimental interface (which will change in future) enables simple relationship queries and lists down the results. -
Domain transfer from old SampleTab domain to new AAP domain:
Now we have started moving old SampleTab domains to new DSP subs domains. This is done only on user request. Let us know if you need to move your samples from old domains t new AAP domain. -
Sample relationship source validation and relationship documentation:
In a sample relationship, sample source should equal to the containing sample accession. This is validated at sample submission time. New section is added to the user guide to explain sample relationships. -
Clearinghouse import:
Now we have all the scripts in place for importing curations from clearinghouse. As a result we have also changed how we curate "not collected" and "not provided" values. This is described in documentation. -
Improvements to EBI Search Engine data dump pipeline
-
BioSamples support to ENA presentation: External reference to ENA is added to samples submitted through BioSamples, i.e. BioSamples authority samples
-
Improve BioSamples documentation
-
Remove alt text from h1 tag in UI. Alt text in h1 tag has caused google to wrongly index biosamples in search results.
-
Include missing domain validation when updating samples:
Domain validation in sample update service was missing in the previous version. This has been added in the new version. Now if a user has access to an existing sample, he can update the sample using any domain he has access to. -
Fix the curation pipeline to retain meaningful attributes having values like “not provided”, “not collected”
-
NCBI Exchange - There are cases of missing SRA accessions in NCBI samples imported to EBI BioSamples. In such cases NCBI samples are cross checked with ENA Oracle database and if SRA accession is found in ENA Oracle database, the NCBI samples are updated with the same
-
There were often failures in updating already private samples in NCBI to private in EBI BioSamples, this has been fixed in this release
-
Changes to BioSamples indexing: Solr CDCR process is quite slow when we re-index BioSamples at the weekend. Therefore at the weekend, instead of using CDCR for datacenter replication, we will copy Solr index to the second datacenter and keep CDCR process down while copying.
-
Pipeline statistics: We will store pipeline related statistics in a new collection in MongoDB. This will enable us to have insight into BioSamples sample distribution and later enable visualization of BioSamples usage.
-
AMR Structured data support: AMR Structured data submission support has been added to BioSamples. You can further read the documentation to know how to submit AMR structured data in BioSamples. Structured data submission has retention of access rights. If the sample submitter and the structured data submitter are different, then the sample submitter can only update the sample metadata and structured data submitter can only update the structured data
-
Livelist pipeline has been improved to generate live samples list, suppressed samples list and killed samples list
-
New pipeline added to provide dump of biosamples to the EBI search engine with the scope of further improvements based on review of data dump
-
BioSamples support to ENA presentation: Feature has been added to ENA Pipeline to update SRA accession in samples submitted through BioSamples, i.e. BioSamples authority samples
-
Include COVID-19 query in BioSamples home page: BioSamples contains samples related to COVID-19 disease. COVID-19 related samples can be easily accessible by following the link on the home page.
-
Curation pipelines have been fixed to accept samples having blank attribute values
-
Bug fix in handling attribute name and measurement in ENA AMR import pipeline
-
Removed duplicate BioSamples accessions New pipeline developed for dealing with duplicate ERS identifiers in BioSamples. This pipeline will be initially used to remove duplicate BioSamples accessions generated by import from ENA and ArrayExpress. The duplication had happened before because BioSamples import data from both ENA and ArrayExpress, where each creates their BioSamples IDs. ArrayExpress also includes a reference to ENA, which creates the duplicate towards the ENA accessions. The pipeline is generic and can be configured to remove similar duplicates in future.
-
Improvements to the /accessions endpoint to add pagination and wildcard search The accessions endpoint now has the same capabilities as the /samples endpoint with the only difference that it brings back just the accession numbers and not the full sample content. This has been requested by the NCBI. This includes text search, applying filters and paging. Instead of a list of accession, it now returns a page with paging information.
-
Ontology annotations to AMR structured data added through Zooma. AMR structured data support in BioSamples was added in our last release, https://www.ebi.ac.uk/biosamples/samples/SAMEA3993565
-
Improvements in BioSamples Web UI 4.1 Broken hyperlinks have been removed through our curation pipelines. 4.2 Original ontology hyperlinks of attributes are maintained where links couldn’t be resolved by OLS. 4.3 Timestamps of samples have been moved to the bottom of the sample display webpage. 4.4 BioSamples sample search page could be slow to load due to long facet generation time. We now return samples immediately, while facets are being loaded. Planned maintenance message has been added
-
BioSamples support for ENA Presentation – BioSamples will use NCBI sample attribute name and not attribute display names to form BioSample sample attribute names.
-
Some of our services are currently undergoing planned maintenance which is due to complete on 4th April 2020. There should be no impact on our users. If you experience any issues, please contact our helpdesk ([email protected]) directly for support.
-
The planned maintenance will affect the Data Submission Portal (DSP), Consequently, and to provide ample time for our users to test and migrate to DSP, theI BioSamples Sample tab APIs will be deprecated on May 1, 2020 (instead of April 1, 2020)
1.Incorporation of AMR structured data support in BioSamples and addition of the new ENA-AMR import pipeline. The ENA-AMR import pipeline queries the ENA API for AMR data of samples. It received back the samples having AMR information and the FTP links to the AMR information. It then attempts to get the AMR data from the FTP links and adds it to the sample and updates the sample in BioSamples. In case of NCBI AMR data, it comes as a part of the NCBI Sample XML and BioSample imports it while the NCBI pipeline executes. 2. Below recommendations from ENA presentation has been implemented in order to achieve the BioSamples support for ENA Presentation use case,
-
BioSamples JSON will have core attributes like description, title and organism in lower case
-
If a user provided attribute of the same name exists and are in upper case, then they will be treated as separate attributes in the BioSamples JSON
"Description" : [ { "text" : "user provided description in ENA sample”, "tag" : "attribute" } ] "description" : [ { "text" : "core description in ENA sample" - } ]
-
If a user-attributes of the same exists and is also in lower case, then it will be an array of elements within an attribute in the BioSamples JSON "description" : [ { "text" : "core description in ENA sample" }, { "text" : "user provided description in ENA sample", "tag" : "attribute" } ]
-
Fixing the curami pipeline to deal with attributes having blank values
-
Fixing the curami pipeline to deal with attributes having tag. Curami pipeline was removing the tags while creating curation objects.
Please note: “tag” is used to specify any additional information about the attribute, like for example a namespace of an external id or a submitter id or to represent if an attribute has been provided specifically by the user. Couple of examples below: "Submitter Id" : [ { "text" : "E-MTAB-565:FOXK2_Dox_treated", "tag" : "Namespace:UNIVERSITY OF MANCHESTER" } ],
"DiseaseState" : [ { "text" : "Osteosarcoma", "tag" : "attribute" ------------- indicates an user provided attribute } ]
-
Modification of /accessions POST endpoint to improve the pre-accessioning performance. Pre-accession of samples is used by ENA and ENA was using our Sample Tab API’s in the past. Sample tab is going to get deprecated from April 01, 2020 and the new improved /accessions POST endpoint can been used for pre-accessioning.
-
Improvements in the /accessions GET endpoint, added search filters, pagination and sizing to this endpoint to comply with such requests from NCBI. In this case NCBI was using BioSamples legacy-xml endpoints and before the legacy-xml endpoints gets deprecated the alternate accessions REST endpoint required these improvements so that similar functionality can be provided to NCBI.
-
RDF release pipeline has been added to BioSamples for continuous RDF release. The frequency of the release can be configured.
-
Improvement of BioSamples pipeline to report back error statuses and log correct error messages and failure cases.
-
Below recommendations from ENA presentation to easily identify top level attributes and user provided attributes and to leave out any attribute that doesn’t make sense to them. This comes in effect for all ENA and NCBI samples imported to BioSamples and is related to the topic of ENA Presentation querying BioSamples API’s for samples metadata: 5.1. to have the tag “attribute” for all user provided attributes . 5.2. to remove the tag “core” from specific top-level attributes (description as an example).
-
BioSamples will retain create date of NCBI samples that are being imported. Currently it overrides the create date and replaces it with the date and time when the sample is saved in BioSamples.
-
Handler added to check and update sample status in BioSamples for SUPPRESSED samples in ENA/NCBI. SUPPRESSED samples that exist in ENA and not in BioSamples are created in BioSamples. This helps to have a consistent view of the samples in ENA and BioSamples.
-
Contact full details will be saved and displayed by default, which includes name, role, email, affiliation etc. Request param -setfulldetails if set false and passed in the request URI, full details of contact won’t be saved.
-
ENA BioSamples integration changes has been done in this release. This will enable ENA presentation to query BioSamples API for the samples metadata. Short description of the changes done are given below:
-
Retaining of ArrayExpress elements in ENA imported samples
-
Mapping of alias in ENA sample XML to name (top-attribute) in BioSamples JSON
-
Mapping of SAMPLE_ATTRIBUTE/alias in ENA sample XML to characteristics/alias in BioSamples JSON
-
Removing tagging of core attributes from Synonyms for ENA/NCBI/DDBJ samples. SUBMITTER_ID, EXTERNAL_ID, UUID, ANONYMIZED_NAME, INDIVIDUAL_NAME attributes were earlier mapped to synonyms. With this release they are mapped to individual attributes under characteristics in BioSamples JSON, like characteristics/External Id, characteristics/Submitter Id and so on
-
Introduction of tag in BioSamples JSON for mapping namespace values in ENA/NCBI/DDBJ samples. An example below: External_id" : [{ "text" : "GM18582", “tag” : “Namespace: Coriell” } ] "Submitter Id" : [ { "text" : "ZF_CR_MPX22_279-sc-2227782", "tag" : "Namespace:SC" } ]
-
Handling for multiple descriptions (core description and SAMPLE_ATTRIBUTE description) for ENA/NCBI/DDBJ samples. An example below. Reusing of tag to show if the description is of core or sample attributes "Description" : [ { "text" : "Protocols: U2OS cells …..)", "tag" : "core" }, { "text" : "This sample has been re-named", "tag" : "attribute" } ]
-
Removing characteristics/synonym from BioSamples JSON for ENA/NCBI/DDBJ samples. All attributes that were tagged under synonyms now has individual attributes under characteristics and hence synonym is not required. Alias is now mapped to name too and hence it makes synonym redundant
-
PRIMARY_ID of NCBI/DDBJ samples mapped to characteristics/SRA accession in BioSamples JSON. This will bring samples metadata in BioSamples in sync for ENA/NCBI/DDBJ samples.
-
Title was mapped to characteristics/Title (for ENA samples) and characteristics/description title (for NCBI/DDBJ samples). Title is now mapped to characteristics/Title for all ENA/NCBI/DDBJ samples
-
GenBank common name handled in characteristics/Common Name for NCBI/DDBJ samples. Provision is kept for ENA samples too if such an attribute exists.
-
Performance improvements of ENA pipeline
-
Create date added for ENA/NCBI/DDBJ samples
-
Retaining of ENA prefixed attributes in BioSamples JSON
-
-
UI bugfix to display contact role. Earlier it used to show name instead of role.
-
Change curation-view pipeline to read samples from MongDB. To crawl all the samples available in BIoSamples, we can’t use biosamples-client get all samples method as it will not return non-indexed samples (eg. suppressed samples)
-
Deprecation of SampleTab submission format.
-
Adding static collection for samples+curations.
-
Modify applying order for the curation objects.
-
Add link to sample accession.
-
Add DUO attribute to external reference class
-
Add script to import EGA data
-
Add presto connector as a BioSamples client module
-
Added API in biosamples-client to utilize JWT tokens
-
Resolved issue where ENA pipeline failed if FIRST_PUBLIC date is not available
-
Replicate required ENA XML Dump functionality in the ENA pipeline
-
Added an annotation 'submitted via USI' to USI samples
-
Added support for suppressed samples imported theough ENA pipeline
-
Added user documentation of JSON schema
-
Added logging and retry logic for reindexing pipeline
-
Refined ncbi pipeline to check suppressed samples are in solr index before removing
-
Added support for suppressed samples to enable dbGap data loading
-
Fix confusion between supressed and private samples in dbGap data
-
Livelist file: adding flush to make sure file is written
-
Add validation and accessioning service
-
Fix SampleTab template download link
-
Added a Curation Undo Pipeline to allow for removal of erroneous curations.
-
Fix an issue where long attributes break the sample box UI.
-
Corrected error in curation pipeline which caused sample characteristics to be removed erroneously
-
Added holiday message
-
Added libraries to enable applications to use Graylog to allow configuration of aggregated logging
-
Switched to the AAP explore environment at https://explore.api.aai.ebi.ac.uk
-
Updated the default AAP URL used by the BioSamples client
-
Included sampletab template file in the sampletab documentation
-
Included ETAG and Curation Object recipes to the BioSamples cookbook
-
Removed name and API key lookup functionality from SampleTab process
-
Addition of AMR structured data into BioSamples
-
Submission of samples with a relationship not targeting a valid accession now return an error
-
Fixed bug with Phenopacket export not able to extract medatada for Orphanet terms
-
Updated user interface to use the newer version of the EBI visual framework
-
Improved documentation navigation experience adopting a new menu style
-
Fixed bug that search failed when using a colon with a non-indexed field. e.g. taxon:9696
-
Added the BioSamples cookbook
-
Fixed issue where there are duplicate organism attributes with different cases in a sample
-
Updated the error message in the SampleTab UI to take into account large submissions timeout
-
As part of curation pipeline attributes with the value "not_applicable" are removed
-
Date titles on the sample page are now "Releases on" and "Updated on" rather than "Release" and "Update"
-
An initial accession endpoint has been added to the REST API to enable ENA to get a list of accessions for a project
-
A multi-step Docker build has been added to allow Docker images to be distributed on quay.io
-
A fix has been made for an issue that caused the Zooma Pipeline to fail on wwwdev
-
Additional sample attributes required by ENA are now available including a single, top-level taxId field
-
The export box for a sample is now renamed download and contains a list of serialisations that always download as a file fixing a blocked popups issue in Safari
-
The search results now have an updated look and feel based on feedback from ENA
-
Sample JSON now contains a numeric taxId field at the top level
-
IRI of ontology terms now resolve to the defining ontology when they are available in multiple ontologies
-
Requests for a sample now contain a computed ETag header to identify changes
-
When requesting a private sample an explanation message is now provided in addition to the 403 error code
-
The search UI now contains a clear filters button
-
Expose the BioSchemas markup with enhanced context and Sample ontology code
-
SampleTab submission pipeline has been rewritten for better robustness
-
In the samples results page, the sample name and the sample accession are now linking to the single sample page
-
Fixed various broken hyperlinks on the home page and in documentation
-
GDPR:
-
SampleTab submissions enforce explicit acceptance of the terms of service and the privacy information
-
GDPR notices added throughout
-
-
SampleTab where targets of relationships are neither sample name nor sample accession are now rejected, providing user additional information on the problematic data
-
Bioschema.org entities are exported in BioSamples and available both in the UI - embedded in a script tag - and through the API
This is a bugfix release that addresses the following issues: * GDPR notices * Update format of the Sitemap file
This is a bugfix release that addresses the following issues:
-
Improves search handling of special characters in facets
-
Improves search handling of special characters in search terms
-
Fix issue with curation link URLs
-
Implemented DataCatalog, Dataset and DataRecord profiles on JSON+LD
-
Add ability to control which curation domains are applied to a sample
-
Updated and improved API documentation
-
Updated and improved SampleTab documentation
-
Fix links to XML and JSON serialisation in the UI
-
Fix bug in handling special characters in SampleTab submission
-
Add export pipeline
-
Add copy down pipeline
This is a bugfix release that addresses the following issues:
-
Improved consistency of paged search results if any of the samples are added or modified whilst paging
-
Improved search update throughput by using Solr transaction log
-
Updated JSON+LD format to the latest version
-
Correctly accept XML sample groups and their related samples
-
Fix issue related to search query terms not being applied to legacy XML and legacy JSON endpoints.
-
Fix incorrect HAL links on autocomplete endpoint
-
Replace SampleTab submitted relationships by name with accessions. As a consequence, they can now be consistently cross referenced by accession in user interface and API
-
Improved indexing of samples when they are rapidly updated or curated
-
Updated Elixir Deposition Database banner URL
-
Reduce number of Zooma calls by not attempting to map "unknown" or "other" attributes
-
Reduce load on OLS by ensuring Zooma does not requery OLS as any results from OLS would not be used by BioSamples
This is a bugfix release that addresses the following issues:
-
Persistence of search terms and filters when using HAL paging links
-
SameAs relation in the legacy JSON API works as intended
-
Removed residual test endpoints from legacy JSON API
-
Details relation in legacy JSON API now correctly resolves
-
Added informative and specific title to webpages
-
Added Elixir Deposition Database banner
This is a bugfix release that addresses the following issues:
-
Forward legacy group URLs /biosamples/groups/SAMEGxxxx to /biosamples/samples/SAMEGxxxxx
-
Missing or malformed update and release date on legacy XML group submission will default to current datetime. It is not recommended that users intentionally rely on this.
-
Index legacy XML group submissions, which was not happening due to an unexpected consequence of the interaction of components.
-
Redirect /biosamples/sample and /biosamples/group URLs in case of typo
This is a bugfix release that addresses the following issues:
-
Fix javascript on SampleTab submission and accession
-
Handle load-balanced accessioning
-
Fix for storage of relationships source on new samples
This is a bugfix release that addresses the following issues:
-
Fix submission of new unaccessioned samples with relationships by inserting an assigned accession into the source of any relationships that are missing it.
-
Fix curation pipeline of numeric organism iri to "http://purl.obolibrary.org/obo/NCBITaxon_+taxId" when it should be "http://purl.obolibrary.org/obo/NCBITaxon_"+taxId e.g. http://purl.obolibrary.org/obo/NCBITaxon_9606
-
Allow CORS requests for legacy XML APIs.
-
Updated homepage project sample links to use a filter search rather than a text search.
Version v4.0.0 represents a re-architecture and re-engineering of the BioSamples software stack. It is now based on the Java Spring-Boot framework, utilising MongoDB for storage and Solr for indexing and search. It tries to follow up-to-date web standards and conventions, while remaining backwards compatible. This will also give us a strong and stable foundation to build more features and improvements from, more reliably and more rapidly.
Highlights include:
-
Submissions and updates will be available immediately via accession, and will be available via search within a few minutes or less. There is also improved handling of submissions and updates, with fewer errors and better feedback about any problems.
-
Integration with EBI AAP for login management and access to pre-publication samples, including use of ELIXIR AAI single sign-on accounts.
-
Separation of submitted sample information from curation of that information, including the ability for 3rd party (re-)curation of samples. Please contact us if you would be interested in more information and/or to supply curation information.
-
Improved handling of non-alphanumeric characters in attribute types e.g. "geographic location (country and/or sea)"
-
Improved faceting allowing selection of multiple values within same facet, fixed re-use and re-distribution of search URLs. This will be expanded in future with additional facet types where appropriate.
-
Support and recommend the use of content negotiation to accessing multiple formats at the same URIs. In addition to the content (HTML vs XML vs JSON) this also supports compression and caching through standard mechanisms.
-
Java client using Spring, and a Spring-Boot starter module for easy use. This is used by BioSamples internally and other teams at EMBL-EBI, so is high performance and battle tested.
-
Containerisation using Docker and Docker-Compose, which makes it easier to run a local version for client development or for local storage of sample information.
-
Ontology terms Numeric tax IDs (e.g. 9606) and short ontology terms (e.g. PATO:0000384) are being replaced with full IRIs (e.g. http://purl.obolibrary.org/obo/NCBITaxon_9606 and http://purl.obolibrary.org/obo/PATO_0000384 ) in many places, eventually everywhere.
-
Groups will continue to exist for backwards compatibility purposes. However, we are investigating future development to reduce or remove many of these in favour of alternatives such as filtering samples by external link, or delegating grouping of samples to other EMBL-EBI archives such as BioStudies.
This is the preferred API for use, and uses the same URIs as the HTML
pages, and utilising content negotiation to provide a JSON response.
This is designed as
a hypermedia
as the engine of application state (HATEOS) API and therefore we
recommend users do not use specific URLs but rather follow relationships
between API endpoints, much like a user would use links between HTML
pages. It is similar to the /biosamples/api
JSON format, with a few
critical differences:
-
added release in full ISO 8601 format including time. The backwards-compatible releaseDate exists but should be considered deprecated and will be removed in a future release.
-
added update in full ISO 8601 format including time. The backwards-compatible updateDate exists but should be considered deprecated and will be removed in a future release.
-
removed description as a separate field, is now available as a characteristic.
-
remove relations rel link; equivalent information is now embedded in sample in relationships and externalReferences lists.
-
remove sample rel link; with relations now embedded, this link serves no purpose.
-
added curationLinks rel link.
-
ordering may be different.
-
fields are not displayed if empty or null.
-
characteristic names accurately reflect what was submitted and may now be multiple words and may include non alphanumeric characters (e.g brackets, greek letters, etc). In the
/biosamples/api
responses characteristic names were always camelCased and with non-alphanumeric characters removed. -
external references directly embedded in the samples and the groups.
We are maintaining this for backwards compatibility. Later in 2018 we
will be consulting about future development of this API, particularly in
the context of the improved JSON /biosamples
API using content
negotiation and several long-standing issues with limitations arising
from the XML schema in use.
-
XML element TermSourceREF element Name and element URI are removed.
-
XML element Property attributes characteristic and comment always false.
-
elements and attributes may be in different order.
-
allows only one IRI on attributes, so in rare cases of multiple IRIs will not be complete.
-
Query parameter
query
has now a default value of * if none is provided. -
Query parameter
sort
is ignored for the search, due to undefined behaviour and lack of usage.
This API should be considered deprecated and we will aim to remove
it by 2019. Any users of this should move to using the /biosamples
URIs to retrieve JSON representations with an improved schema via
content negotiation. Further announcements will be made in future for
specific updates and deadlines.
-
ordering may be different from previous versions, and is not guaranteed for future versions.
-
fields are not displayed if empty or null.
-
/api/externallinksrelations/{id}/sample
and/api/externallinksrelations/{id}/group
are removed due to lack of usage. -
fixed externalReferences and publications to be nested objects and not JSON strings.