-
Notifications
You must be signed in to change notification settings - Fork 3
Geney Queries
A Geney query is represented in JSON with three keys: filters
, features
and metaTypes
. The features
and metaTypes
keys are both arrays of strings representing the columns that the user would like included in the response. The filters
key however, is a little more complicated.
"Filters" is an object that maps metadata types to the desired values for that metadata type. For discrete metadata types this is simply an array of strings. For continuous types, it is an array of objects, with each object being a "continuous filter"
A "continuous filter" is simply an object with two keys: operator
and value
. The operator
has to be in the set of standard comparison operators { >
, <
, >=
, <=
, ==
, !=
}, and the value
must be a number.
Lets pretend we have a dataset of cancer samples that includes some clinical data about each patient as well as quantified gene expression data for all genes. The following could be a valid Geney Query.
{
"filters": {
"cancerType": ["Lung", "Breast"],
"yearsOfSurvival": [
{
"operator": ">=",
"value": 0
},
{
"operator": "<=",
"value": 3
}
]
},
"features": ["BRCA1","BRCA2", "EGFR"],
"metaTypes": ["cancerType", "yearsOfSurvival", "gender"]
}
Geney starts by going through all of the filters, and looking for samples that match any of the criteria. For a discrete metadata type, a sample has to have ANY of the values in the list in order to be matched. For a continuous metadata type, a sample has to match ALL of the "continuous filters" in order to be matched. The set of the samples that match the first filter are intersected with the set of samples that match the next filter, leaving only the samples that match all of the filters. So in our fictional dataset, the above query would match all samples from patients that had "Lung" or "Breast" cancer, and who survived between 0 and 3 years. Note that a sample does not have to have BOTH "Lung" and "Breast" cancer to match.
The values provided in the features
and metaTypes
arrays are used to make sure only the requested information about each sample is downloaded. In addition to any values provided in metaTypes
, sampleID
will always be added, so it is not necessary to include it. In our running example, if we were to download the result of this query in csv format, our header row would look like this:
sampleID,cancerType,yearsOfSurvival,gender,BRCA1,BRCA2,EGFR
Note that we included an additional metadata type that we did not include in our filters. Also, if you leave either of these arrays empty, Geney will assume you want ALL of the metaTypes and/or features included in your output.
We have created a JSON Schema that can be (and is) used to validate Geney queries, and it is located here