Skip to content

Latest commit

 

History

History
124 lines (100 loc) · 5.75 KB

README.md

File metadata and controls

124 lines (100 loc) · 5.75 KB

enterprise-components

Overview

Kdb+ enterprise architectures for the management and analytics of large datasets are split into various components (building blocks). Each component strictly follows the defined design principles:

  • public functions as fully documented APIs
  • consistent logging
  • interfaces for monitoring and process control

Data feeding

feedCsv - Generic CSV files reader and publisher

  • provides fully customizable and configurable CSV file parser publishing data to tickLF and/or tickHF
  • enables automatic detection of CSV input files in specified locations on local drive
  • shields the system from corrupted data sets
  • archives processed files (if required)

Data distribution

tickHF - Publishing and distribution of High Frequency data

  • tuned for handling large volumes of data
  • ensures minimal latency publishing
  • maintains subscription lists
  • provides data recovery facility (all records stored in binary journal)
  • performs journal rollover at end-of-day

tickLF - Distribution of Low Frequency data

  • designed for reference data handling
  • handles inserts, upserts, updates and deletes
  • allows data validation (e.g. model consistency check)
  • allows data enrichment via user defined plugins (e.g. adding time column, adjustments of data content, etc.)
  • provides data recovery facility (all records stored in binary journal)
  • performs journal rollover at end of day
  • maintains subscription lists
  • provides ability for using custom plugins

Custom plugins are helpful tools when there is a need for data or model manipulation. For example, if received data does not contain a time column which is required in the data model, the user can use a custom plugin to dynamically add this missing column to the output table.

Data processing and storage

rdb - In-memory database

  • allows configuration of subscription and end-of-day process
  • performs auto re-subscription and auto reconnection mechanism to tickHF and tickLF

hdb - Historical database

hdbWriter - writes historical data directly into the hdb

  • writing data directly to the hdb process
  • support for writing to multiple different partitions
  • support for data appending

eodMng - End-of-day processing

  • provides mechanism for hdb data synchronization between different machines
  • allows hdb housekeeping via predefined plugins (deletion, compression, conflation)
  • exposes API for defining custom plugins

rproc - realtime processing

  • performing calculation on live data streams
  • subscribing to the realtime source data stream (i.e. tickHF) and calculates derived data stream
  • rproc component alone does not provide any useful functionality - it is just a container
  • rproc requires additional code called plugin which defines the logic for calculation of derived data
  • rproc package consist of some sample predefined plugins (snap, mrvs, ohlc) which show how a plugin code can be defined. Those plugins are described in more detail below.

stream - Stream-based data processing

  • allows on the fly data processing published via tickHF
  • provides access to derived data via accessPoint
  • allows storing derived data in rdb/hdb
  • facilitates quick recovery after restart
  • performs journal rollover at end-of-day

Data access

accessPoint - End users entry point

  • performs user authentication and authorization
  • facilitates connections to other q processes (e.g. rdb, hdb )
  • enforces access control by defining functions that are permitted to specific users

System maintenance

yak - Process Management Tool

  • manages (starts/stops/restarts) different type of processes
  • manages group of processes (aligned with resolved dependency order)
  • provides convenient access to process status and detailed process information
  • offers robust configuration mechanism
  • delivers bootstrapping for managed processes
  • allows interrupting managed processes (UNIX only)
  • supports custom pager/viewer for log files, standard output, and standard error
  • provides top-like functionality

hk - Housekeeping

  • performs system artifacts cleaning (compress and delete actions)
  • integrated into Enterprise Components deployment structure (no cron dependencies for execution scheduling)

Monitor server

monitor Server monitoring tool

  • allows capturing various information about system state, inter-process communication state, system events (like initialization of each component, subscription, journal replay), OS resources usage
  • publishes data in tickHF protocol - can be stored in rdb and hdb for later analysis or processed in any tickHF protocol-compatible tool e.g. stream process

Testing

qtest - Test framework

  • execution of tests organized into test suits
  • test cases defined as q functions
  • set of convenient assertions
  • integration with enterprise-components - process management helper functions, remote assertions
  • facilities for test debugging
  • result export to xml file (compatible with JUnit format)

mock - Set of mock used for various tests