The Global Alliance for Genomics and Health (GA4GH) is an international coalition formed to enable the responsible, voluntary, and secure sharing of genomic and health-related data. This awesome list collects resources, projects, tools, and standards from the GA4GH ecosystem that support the mission of enabling responsible data sharing.
The following resources are adapted from The GA4GH Task Execution API: Enabling Easy Multi Cloud Task Execution
@article{kanitz2024ga4gh,
title={The GA4GH Task Execution API: Enabling Easy Multi Cloud Task Execution},
author={Kanitz, Alexander and McLoughlin, Matthew H and Beckman, Liam and Malladi, Venkat S and Ellrott, Kyle},
journal={Computing in Science \& Engineering},
year={2024},
publisher={IEEE}
}
A listing of available servers, proxy and client implementations that utilize the TES API.
Type | Project | Description | Source |
---|---|---|---|
API | TES | OpenAPI definition of the specification | GitHub |
API | TES | Conformance test runner | GitHub |
Server | Funnel | TES server implementation for HPC/HTS systems including AWS Batch, Google Cloud, Kubernetes, Slurm, GridEngine, and HTCondor | GitHub |
Server | Pulsar | TES server implementation for the Galaxy/Pulsar federated distributed network | Docs |
Server | TES-Azure | TES server implementation for Microsoft Azure | GitHub |
Server | TESK | TES server implementation for Kubernetes/Native Cloud systems | GitHub |
Proxy | proTES | Proxy service for injecting middleware into GA4GH TES requests | GitHub |
Client | Cromwell | Workflow management system for executing composed workflows in the Workflow Definition Language (WDL) DSL | Docs |
Client | cwl-tes | Workflow management system for executing workflows in the Common Workflow Language (CWL) DSL | GitHub |
Client | ELIXIR Cloud Components | Web Component library for interacting with TES services (and other GA4GH APIs) | Site |
Client | Nextflow | Workflow management system for executing workflows composed in the Nextflow DSL | Site |
Client | py-tes | Python client library for interacting with TES services | GitHub |
Client | Snakemake | Workflow management system for executing workflows composed in the Snakemake DSL | Site |
Client | Toil | Workflow management system for executing workflows composed in the Toil and CWL DSLs | Docs |
Common TES use cases. The TES API wraps around compute environments providing a standard way of executing tasks. Researchers can write and package their tasks and data in a domain-specific language (DSL) workflow language. They then hand the orchestration of the tasks over to the respective workflow management systems. The workflow management systems can then make use of TES clients to distribute tasks across different environments.
Alternatively, users can submit individual tasks to TES servers directly via command-line (CLI) or graphical user (GUI) interfaces. Thus, TES makes it easier for researchers to make use of a variety of compute environments seamlessly. Applications can support new compute environments by integrating with TES API, rather than develop unique connections for each environment.
TES Execution Architecture. An outline of the separate layers found in current TES service implementations. The client talks to a server, which is responsible for allocating a worker node on an HPC/HTC or cloud infrastructure. The TES worker is responsible for transferring inputs, running user code, capturing logging and storing outputs.
An example TES task demonstrating the use of inputs, outputs, and logging.
"inputs": [
{
"name": "input-genome-data",
"url": "gs://genomics-bucket/input-data/genome-data.bam",
"path": "/data/genome-data.bam",
"type": "FILE"
},
{
"name": "reference-genome",
"url": "gs://genomics-bucket/reference/human-reference.fa",
"path": "/data/human-reference.fa",
"type": "FILE"
}
],
"outputs": [
{
"name": "output-processed-data",
"url": "gs://genomics-bucket/processed-data/processed-output.bam",
"path": "/output/processed-output.bam",
"type": "FILE"
},
{
"name": "log-file",
"url": "gs://genomics-bucket/logs/task-log.txt",
"path": "/output/task-log.txt",
"type": "FILE"
}
],
"resources": {
"cpu_cores": 8,
"ram_gb": 32,
"disk_gb": 100,
"preemptible": true,
"zones": ["us-west1-a", "us-west1-b"],
"backend_parameters": {
"VmSize": "Standard_D64_v3"
}
},
"volumes": ["/mnt/workdir"],
"executors": [
{
"image": "bioinformatics/pipeline",
"command": [
"bash",
"-c",
"/tools/process-genome.sh",
"/data/genome-data.bam",
"/data/human-reference.fa",
"/output/processed-output.bam"
],
"stdout": "/output/task-log.txt",
"stderr": "/output/task-error-log.txt",
"workdir": "/mnt/workdir",
"env": {
"GENOME_ENV": "production",
"MAX_THREADS": "8"
}
}
],
"tags": {
"department": "bioinformatics",
"project": "genome-analysis"
},
"logs": [
{
"start_time": "2023-12-25T00:00:00+00:00",
"end_time": "2023-12-25T12:12:12+00:00",
"logs": [
{
"start_time": "2023-12-25T00:00:01+00:00",
"end_time": "2023-12-25T12:12:12+00:00",
"exit_code": 0
}
]
}
]
Full Packet Example
{
"inputs": [
{
"name": "input-genome-data",
"url": "gs://genomics-bucket/input-data/genome-data.bam",
"path": "/data/genome-data.bam",
"type": "FILE"
},
{
"name": "reference-genome",
"url": "gs://genomics-bucket/reference/human-reference.fa",
"path": "/data/human-reference.fa",
"type": "FILE"
}
],
"outputs": [
{
"name": "output-processed-data",
"url": "gs://genomics-bucket/processed-data/processed-output.bam",
"path": "/output/processed-output.bam",
"type": "FILE"
},
{
"name": "log-file",
"url": "gs://genomics-bucket/logs/task-log.txt",
"path": "/output/task-log.txt",
"type": "FILE"
}
],
"resources": {
"cpu_cores": 8,
"ram_gb": 32,
"disk_gb": 100,
"preemptible": true,
"zones": ["us-west1-a", "us-west1-b"],
"backend_parameters": {
"VmSize": "Standard_D64_v3"
}
},
"volumes": ["/mnt/workdir"],
"executors": [
{
"image": "bioinformatics/pipeline",
"command": [
"bash",
"-c",
"/tools/process-genome.sh",
"/data/genome-data.bam",
"/data/human-reference.fa",
"/output/processed-output.bam"
],
"stdout": "/output/task-log.txt",
"stderr": "/output/task-error-log.txt",
"workdir": "/mnt/workdir",
"env": {
"GENOME_ENV": "production",
"MAX_THREADS": "8"
}
}
],
"tags": {
"department": "bioinformatics",
"project": "genome-analysis"
},
"logs": [
{
"start_time": "2023-12-25T00:00:00+00:00",
"end_time": "2023-12-25T12:12:12+00:00",
"logs": [
{
"start_time": "2023-12-25T00:00:01+00:00",
"end_time": "2023-12-25T12:12:12+00:00",
"exit_code": 0
}
]
}
]
}
- GA4GH: Global Alliance for Genomics and Health
- DRS: Data Repository Service
- TRS: Tool Registry Service
- TES: Task Execution Service
- WES: Workflow Execution Service
- Implementations: Implementations of GA4GH Producs
If you're working with TES or would like to add any additional programs here, please reach out or create a new PR or issue. We'd love to hear about it!
Thanks to @dec0dOS for the amazing-github-template ⚡