-
Notifications
You must be signed in to change notification settings - Fork 2
ES scheme
Marina Golosova edited this page Jun 9, 2020
·
38 revisions
Currently, DKB uses elasticsearch as a final storage, mapping can be found in here. A single index (production_tasks
, analysis_tasks
) stores 2 types of documents: task
and output_dataset
. The following tables list fields of the documents.
Columns:
- Field name
- Type - note that elasticsearch's mapping has no special definition of lists - for example, integer and list of integers are both defined as "integer", and the field's actual contents, in this regard, depend on what was put into it. Some fields are stored in multiple types, in such cases the additional types are listed in brackets.
- Source from which system the information is retrieved ("derivative" means that it is not present in any source and is constructed from other fields, "service" means that the field is not the part of the data and serves other purposes).
- Comment
- Value - how the field is calculated, "as-is" means value of the field with the same name in the source.
Documents of type task
represent the tasks processing ATLAS' data.
Field name | Type | Source | Comment | Value |
---|---|---|---|---|
architecture | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
campaign | text (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
chain_data | integer | Oracle, table ATLAS_DEFT. t_production_task | task chain is a sequence of related tasks: each task's output is used as input for the next one | list of ids of all tasks in the chain that includes this task, constructed by subquery (tasks after this one are omitted) |
conditions_tags | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
core_count | short | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
ctag | keyword | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
description | text | Oracle, table ATLAS_DEFT. t_prodmanager_ request | as-is | |
end_time | date | Oracle, table ATLAS_DEFT. t_production_task | source field endtime
|
|
energy_gev | integer | Oracle, table ATLAS_DEFT. t_prodmanager_ request | as-is | |
geometry_version | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
hashtag_list | keyword | Oracle, table ATLAS_DEFT. t_hashtag | aggregation of source field hashtag is lowercased and split into a list |
|
n_events_per_job | long | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
n_files_per_job | short | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
n_files_to_be_used | integer | Oracle, table ATLAS_DEFT. t_production_task | source field filestobeused
|
|
output_formats | keyword | Oracle, table ATLAS_DEFT. t_production_task | as-is, but split into a list | |
phys_group | text (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
pr_id | integer (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
primary_input | text (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
processed_events | long | Oracle, table ATLAS_PANDA. jedi_datasets | sum of source's neventsused corresponding to given taskid (for input datasets) if it is not "Null", total_events otherwise |
|
project | text (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
requested_events | long | Oracle, table ATLAS_PANDA. jedi_datasets | sum of source's nevents corresponding to given taskid (for input datasets) |
|
run_number | integer (keyword) | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
start_time | date | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
status | keyword | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
step_id | integer | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
step_name | text (keyword) | Oracle, table ATLAS_DEFT. t_step_template | as-is | |
subcampaign | text (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
task_timestamp | date | Oracle, table ATLAS_DEFT. t_production_task | source field timestamp
|
|
taskid | integer (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
taskname | text (keyword) | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
ticket_id | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
total_events | long | Oracle, table ATLAS_DEFT. t_production_task | as-is | |
trans_home | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
trans_path | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
trans_uses | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
trigger_config | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
user_name | keyword | Oracle, table ATLAS_DEFT. t_production_task | source field username
|
|
vo | keyword | Oracle, table ATLAS_DEFT. t_task | extracted from source field jedi_task_ parameters
|
|
input_bytes | long | Rucio | as-is, "-1" if it is missing or error occurs | |
primary_input_deleted | boolean | Rucio | "False" if input_bytes is successfully retrieved from source, "True" otherwise |
|
primary_input_events | long | Rucio | as-is | |
hs06 | long | Chicago ES, index tasks_archive_* | source field cputime
|
|
toths06 | long | Chicago ES, index jobs_archive_* | CPU resources used by the task | sum of source's hs06sec where jobstatus is "failed" or "finished" |
toths06_failed | long | Chicago ES, index jobs_archive_* | 'wasted' CPU resources | sum of source's hs06sec where jobstatus is "failed" |
toths06_finished | long | Chicago ES, index jobs_archive_* | CPU resources the task would use in the perfect world | sum of source's hs06sec where jobstatus is "finished" |
chain_id | integer | Derivative | id of the chain's root (the first, initial task in it) | derived from chain_data
|
input_events | long | Derivative | calculated from several other fields' values | |
phys_category | keyword | Derivative | physics category with which the task can be associated | determined by hashtag_list and taskname |
_update_required | boolean | Service | marks documents that contain incomplete information about object and thus must be updated sooner or later | "True" if the record is incomplete and should be updated, "False" otherwise |
Documents of type output_dataset
represent the datasets generated by the tasks while processing ATLAS' data.
Field name | Type | Source | Comment | Value |
---|---|---|---|---|
datasetname | text (keyword) | Oracle, table ATLAS_PANDA. jedi_datasets | full name of the dataset | as-is |
bytes | long | Rucio | size of the dataset | as-is, "-1" if dataset was not found in source |
deleted | boolean | Rucio | whether the dataset was deleted from source or not | as-is, "True" if dataset was not found in source |
events | long | Rucio | number of events in the dataset | as-is |
data_format | keyword | Derivative | extracted from datasetname
|
|
cross_section | double | AMI | source field crossSection
|
|
cross_section_ref | keyword | AMI | source field crossSectionRef
|
|
gen_filt_eff | double | AMI | source field genFiltEff
|
|
k_factor | double | AMI | source field kFactor
|
|
me_pdf | keyword | AMI | source field mePDF
|
|
process_group | keyword | AMI | source field processGroup
|
|
_update_required | boolean | Service | marks documents that contain incomplete information about object and thus must be updated sooner or later | "True" if the record is incomplete and should be updated, "False" otherwise |