From 6145ca0875c8e75c0ee0fe30452b552aa4ddc52b Mon Sep 17 00:00:00 2001 From: Aarthy Adityan Date: Tue, 6 Aug 2024 18:47:42 -0400 Subject: [PATCH] docs(dev): update local development and config docs --- README.md | 27 ++++++++++++++++----------- docs/configuration.md | 23 +++++++++++++++++++++-- docs/local_development.md | 27 +++++++++++++++++++++++---- 3 files changed, 60 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 8b12d87..badf1a5 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,11 @@ # DataOps Data Quality TestGen ![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F&query=results%5B0%5D.name&label=latest%20version&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F&query=pull_count&style=flat&label=docker%20pulls&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/#!dataops-testgen-help/dataops-testgen-help) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat&logo=slack)](https://data-observability-slack.datakitchen.io/join) -*

DataOps Data Quality TestGen can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling,  new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. DataOps TestGen is part of DataKitchen's Open Source Data Observability.

* +*

DataOps Data Quality TestGen, or "TestGen" for short, can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. TestGen is part of DataKitchen's Open Source Data Observability.

* ## Features -What does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and find data issues in new data. +What does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and find data issues in new data.

DatKitchen Open Source Data Quality TestGen Features - New Data

@@ -39,7 +39,7 @@ On Unix-based operating systems, use the following command to download it to the curl -o dk-installer.py 'https://raw.githubusercontent.com/DataKitchen/data-observability-installer/main/dk-installer.py' ``` -* Alternatively, you can manually download the [`dk-installer.py`](https://github.com/DataKitchen/data-observability-installer/blob/main/dk-installer.py) file from the [data-observability-installer](https://github.com/DataKitchen/data-observability-installer) repo. +* Alternatively, you can manually download the [`dk-installer.py`](https://github.com/DataKitchen/data-observability-installer/blob/main/dk-installer.py) file from the [data-observability-installer](https://github.com/DataKitchen/data-observability-installer) repository. * All commands listed below should be run from the folder containing this file. * For usage help and command options, run `python3 dk-installer.py --help` or `python3 dk-installer.py --help`. @@ -50,6 +50,7 @@ The installation downloads the latest Docker images for TestGen and deploys a ne ```shell python3 dk-installer.py tg install ``` + The `--port` option may be used to set a custom localhost port for the application (default: 8501). To enable SSL for HTTPS support, use the `--ssl-cert-file` and `--ssl-key-file` options to specify local file paths to your SSL certificate and key files. @@ -63,16 +64,16 @@ The [Data Observability quickstart](https://docs.datakitchen.io/articles/open-so ```shell python3 dk-installer.py tg run-demo ``` + In the TestGen UI, you will see that new data profiling and test results have been generated. ## Product Documentation -[DataOps TestGen](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help) +[DataOps Data Quality TestGen](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help) ## Useful Commands -The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yaml` files used by the installation. - +The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yml` files used by the installation. ### Remove demo data @@ -93,6 +94,7 @@ New releases of TestGen are announced on the `#releases` channel on [Data Observ ### Uninstall the application The following command uninstalls the Docker Compose application and removes all data, containers, and images related to TestGen from your machine. + ```shell python3 dk-installer.py tg delete ``` @@ -100,18 +102,21 @@ python3 dk-installer.py tg delete ### Access the _testgen_ CLI The [_testgen_ command line](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-commands-and-details) can be accessed within the running container. -``` + +```shell docker compose exec engine bash ``` Use `exit` to return to the regular terminal. ### Stop the application + ```shell docker compose down ``` ### Restart the application + ```shell docker compose up -d ``` @@ -122,13 +127,13 @@ docker compose up -d We recommend you start by going through the [Data Observability Overview Demo](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview). ### Support -For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and ask post on #support channel. +For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and post on the `#support` channel. ### Connect to your database Follow [these instructions](https://docs.datakitchen.io/articles/#!dataops-testgen-help/connect-your-database) to improve the quality of data in your database. ### Community -Talk and Learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project. +Talk and learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project. Join our community here: @@ -150,7 +155,7 @@ Join our community here: ### Contributing -For details on contributing or running the project for development, check out our contributing guide. +For details on contributing or running the project for development, check out our [contributing guide](CONTRIBUTING.md). ### License -DataKitchen DataOps TestGen is Apache 2.0 licensed. +DataKitchen's DataOps Data Quality TestGen is Apache 2.0 licensed. diff --git a/docs/configuration.md b/docs/configuration.md index 45234a6..2b844b1 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -2,6 +2,12 @@ This document describes the environment variables supported by TestGen. +#### `TESTGEN_DEBUG_LOG_LEVEL` + +Sets logs to the debug level. + +default: `no` + #### `TESTGEN_DEBUG` Invalidates the cache with the bootstrapped application causing the changes to the routing and plugins to take effect @@ -12,9 +18,22 @@ Also, changes the logging level for the `testgen.ui` logger from `INFO` to `DEBU default: `no` #### `TESTGEN_LOG_TO_FILE` -Set it to `yes` to enable rotating file logs to be written under `/var/log/testgen/`. -default: `no` +Enables generation of rotating file logs. + +default: `yes` + +#### `TESTGEN_LOG_FILE_PATH` + +File path under which to generate rotating file logs, when `TESTGEN_LOG_TO_FILE` is turned on. + +default: `/var/lib/testgen/log` + +#### `TESTGEN_LOG_FILE_MAX_QTY` + +Maximum log files to keep (one file per day), when `TESTGEN_LOG_TO_FILE` is turned on. + +default: `90` #### `TG_DECRYPT_SALT` diff --git a/docs/local_development.md b/docs/local_development.md index 4954ce2..0687aa0 100644 --- a/docs/local_development.md +++ b/docs/local_development.md @@ -10,7 +10,14 @@ This document describes how to set up your local environment for TestGen develop ### Clone repository -Login to your GitHub account. Follow [GitHub's guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) to fork the [dataops-testgen](https://github.com/DataKitchen/dataops-testgen) repository, and clone your fork locally. +Login to your GitHub account. + +Fork the [dataops-testgen](https://github.com/DataKitchen/dataops-testgen) repository, following [GitHub's guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo). + +Clone your forked repository locally. +```shell +git clone https://github.com/YOUR-USERNAME/dataops-testgen +``` ### Set up virtual environment @@ -33,7 +40,10 @@ pip install -e .[dev] # On Mac pip install -e .'[dev]' -# [optional] +``` + +On Mac, you can optionally install [watchdog](https://github.com/gorakhargosh/watchdog) for better performance of the [file watcher](https://docs.streamlit.io/develop/api-reference/configuration/config.toml) used for local development. +```shell xcode-select --install pip install watchdog ``` @@ -42,8 +52,8 @@ pip install watchdog Create a `local.env` file with the following environment variables, replacing the `` placeholders with appropriate values. Refer to the [TestGen Configuration](configuration.md) document for other supported values. ```shell -export TESTGEN_LOG_FILE_PATH=var/lib/testgen export TESTGEN_DEBUG=yes +export TESTGEN_LOG_TO_FILE=no export TESTGEN_USERNAME= export TESTGEN_PASSWORD= export TG_DECRYPT_SALT= @@ -68,6 +78,15 @@ Initialize the application database for TestGen. testgen setup-system-db --yes ``` +Seed the demo data. +```shell +testgen quick-start --delete-target-db +testgen run-profile --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d +testgen run-test-generation --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d +testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1 +testgen quick-start --simulate-fast-forward +``` + ### Patch and run Streamlit Patch the Streamlit package with our custom files. ```shell @@ -77,4 +96,4 @@ testgen ui patch-streamlit -f Run the local Streamlit-based TestGen application. It will open the browser at [http://localhost:8501](http://localhost:8501). ```shell testgen ui run -``` \ No newline at end of file +```