Skip to content

Geney Queries

PJ Tatlow edited this page Nov 28, 2017 · 2 revisions

What does a Geney Query look like?

A Geney query is represented in JSON with three keys: filters, features and metaTypes. The features and metaTypes keys are both arrays of strings representing the columns that the user would like included in the response. The filters key however, is a little more complicated.

"Filters" is an object that maps metadata types to the desired values for that metadata type. For discrete metadata types this is simply an array of strings. For continuous types, it is an array of objects, with each object being a "continuous filter"

A "continuous filter" is simply an object with two keys: operator and value. The operator has to be in the set of standard comparison operators { >, <, >=, <=, ==, != }, and the value must be a number.

Lets pretend we have a dataset of cancer samples that includes some clinical data about each patient as well as quantified gene expression data for all genes. The following could be a valid Geney Query.

{
  "filters": {
    "cancerType": ["Lung", "Breast"],
    "yearsOfSurvival": [
      {
        "operator": ">=",
        "value": 0
      },
      {
        "operator": "<=",
        "value": 3
      }
    ]
  },
  "features": ["BRCA1","BRCA2", "EGFR"],
  "metaTypes": ["cancerType", "yearsOfSurvival", "gender"]
}

How does it work?

Geney starts by going through all of the filters, and looking for samples that match any of the criteria. For a discrete metadata type, a sample has to have ANY of the values in the list in order to be matched. For a continuous metadata type, a sample has to match ALL of the "continuous filters" in order to be matched. The set of the samples that match the first filter are intersected with the set of samples that match the next filter, leaving only the samples that match all of the filters. So in our fictional dataset, the above query would match all samples from patients that had "Lung" or "Breast" cancer, and who survived between 0 and 3 years. Note that a sample does not have to have BOTH "Lung" and "Breast" cancer to match.

The values provided in the features and metaTypes arrays are used to make sure only the requested information about each sample is downloaded. In addition to any values provided in metaTypes, sampleID will always be added, so it is not necessary to include it. In our running example, if we were to download the result of this query in csv format, our header row would look like this:

sampleID,cancerType,yearsOfSurvival,gender,BRCA1,BRCA2,EGFR

Note that we included an additional metadata type that we did not include in our filters. Also, if you leave either of these arrays empty, Geney will assume you want ALL of the metaTypes and/or features included in your output.

Schema

We have created a JSON Schema that can be (and is) used to validate Geney queries, and it is located here

Clone this wiki locally