Skip to content

DB and API structure

Dmitry Romanov edited this page Apr 26, 2018 · 9 revisions

Database structure explained

RCDB sql schema

The essential database schema is pretty simple. Tables could be split in four conceptual groups:

  1. runs table. That is where a run numbers are stored.
  2. File storage. Each file has many to many relationship with run numbers.
  3. Conditions. Name-value pairs are stored there
  4. Meta. Logs and SQL DB schema version

Python API data model classes resembles this structure. Most common python classes that you work with:

  • Run - represents run
  • Condition - stores data for the run
  • ConditionType - stores condition name, field type and other

All classes have properties to reference each other. The main properties for conditions management are:

class Run(ModelBase):
    number                  # int - The run number
    start_time              # datetime - Run start time
    end_time                # datetime - Run end time
    conditions              # list[Condition] - Conditions associated with the run


class ConditionType(ModelBase):
    name               # str(max 255) - A name of condition
    value_type         # str(max 255) - Type name. One of XXX_FIELD below
    values             # query[Condition] - query to look condition values for runs

    # Constants, used for declaration of value_type
    STRING_FIELD = "string"
    INT_FIELD = "int"
    BOOL_FIELD = "bool"
    FLOAT_FIELD = "float"
    JSON_FIELD = "json"
    BLOB_FIELD = "blob"
    TIME_FIELD = "time"


class Condition(ModelBase):
    time           # datetime - time related to condition (when it occurred in example)
    run_number     # int - the run number

    @property
    value          # int, float, bool or string - depending on type. The condition value

    text_value     # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD
    int_value      # holds data if type INT_FIELD
    float_value    # holds data if type FLOAT_FIELD
    bool_value     # holds data if type BOOL_FIELD

    run            # Run - Run object associated with the run_number
    type           # ConditionType - link to associated condition type
    name           # str - link to type.name. See ConditionType.name
    value_type     # str - link to type.value_type. See ConditionType.value_type

How data is stored in the DB

In general, one just uses Condition.value to get the right value for the condition. But what happens under the hood?

As you may noticed from comments above, in reality data is stored in one of the fields:

Storage field Value type
text_value STRING_FIELD, JSON_FIELD or BLOB_FIELD
int_value INT_FIELD
float_value FLOAT_FIELD
bool_value BOOL_FIELD

When you call Condition.value property, Condition class checks for type.value_type and returns an appropriate xxx_value.

Why is it so? - because we would like to have queries like: "give me runs where event_count > 100 000"

i.e., if we know that *event_count is int, we would like database to operate it as int.

At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called "hybrid approach to object-attribute-value model". If value is int, float, bool or time, it is stored in appropriate field, which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time, to store more complex objects as JSON or blobs... to figure out them lately