This is a Java implementation of the sequence collection specification to represent INSDC assemblies. To learn more about the sequence collection specification, please refer to seqCol, seqcol-spec and/or the specification.
Briefly, the main issue that the seqcol-spec addresses is that genomes' central providers such as INSDC (e.g. NCBI, ENA), Ensembl or UCSC may agree on the sequence being used but they often differ on the naming of these sequences.
The main goals of this API is to provide:
- A mechanism to ingest a sequence collection object into the database.
- A mechanism to fetch/resolve a sequence collection object given its level 0 digest.
- A mechanism to compare two sequence collection objects to understand their compatibility
After multiple evaluations of different data models, we agreed to use the following model :
Note: the seqCol service is currently deployed on server 45.88.81.158, under port 8081
PUT - SERVER_IP:PORT/eva/webservices/seqcol/admin/seqcols/{asm_accession}
GET - SERVER_IP:PORT/eva/webservices/seqcol/collection/{seqCol_digest}?level={level}
GET - SERVER_IP:PORT/eva/webservices/seqcol/comparison/{seqColA_digest}/{seqColB_digest}
POST - SERVER_IP:PORT/eva/webservices/seqcol/comparison/{seqColA_digest}; body = {level 2 JSON representation of another seqCol}
For a detailed, user friendly documentation of the API's endpoints, please visit the seqCol's swagger page
This web service has some authenticated endpoints. The current approach to secure them is to provide the credentials in the src/main/resources/application.properties file at compilation time, using maven profiles.
The application also requires to be connected to an external database (PostgreSQL by default) to function. The credentials for this database need to be provided at compilation time using the same maven profiles.
You can edit the maven profiles values in pom.xml by locating the below section and changing the values manually or by setting environemnt variables. Alternatively, you can make the changes directly on the application.properties file.
Use <ftp.proxy.host>
and <ftp.proxy.port>
to configure proxy settings for accessing FTP servers (such as NCBI's). Set them to null
and 0
to prevent overriding default the proxy configuration.
Set a boolean flag using <contig-alias.scaffolds-enabled>
to enable or disable parsing and storing of scaffolds in the database.
<profiles>
<profile>
<id>seqcol</id>
<properties>
<spring.profiles.active>seqcol</spring.profiles.active>
<seqcol.db-url>jdbc:postgresql://${env.SERVER_IP}:${env.POSTGRES_PORT}/seqcol_db</seqcol.db-url>
<seqcol.db-username>${env.POSTGRES_USER}</seqcol.db-username>
<seqcol.db-password>${env.POSTGRES_PASS}</seqcol.db-password>
<seqcol.ddl-behaviour>${env.DDL_BEHAVIOUR}</seqcol.ddl-behaviour>
<seqcol.admin-user>${env.ADMIN_USER}</seqcol.admin-user>
<seqcol.admin-password>${env.ADMIN_PASSWORD}</seqcol.admin-password>
<ftp.proxy.host>${optional default=null}</ftp.proxy.host>
<ftp.proxy.port>${optional default=0}</ftp.proxy.port>
<contig-alias.scaffolds-enabled>${optional default=false}</contig-alias.scaffolds-enabled>
</properties>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
</profile>
</profiles>
Once that's done, you can trigger the variable replacement with the -P
option in maven. Example: mvn clean install -Pseqcol
to compile the service including tests or mvn clean install -Pseqcol -DskipTests
to ignore tests.
You can then run: mvn spring-boot:run
to run the service.
- Spring Boot v2.7.13
- PostgreSQL Database v15.2
- Swagger v3 (springdoc-openapi implementation)
- seqCol, seqcol-spec, specification (Specification's details and docs)
- GA4GH refget API meetings (Minutes for the refget API meetings)
- Python implementation (A python implementation of the sequence collection specification)
- CRAM Reference Registry