Skip to content

Commit

Permalink
docs(dev): update local development and config docs
Browse files Browse the repository at this point in the history
  • Loading branch information
aarthy-dk committed Aug 6, 2024
1 parent b690bed commit 6145ca0
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 17 deletions.
27 changes: 16 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# DataOps Data Quality TestGen
![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F&query=results%5B0%5D.name&label=latest%20version&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F&query=pull_count&style=flat&label=docker%20pulls&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/#!dataops-testgen-help/dataops-testgen-help) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat&logo=slack)](https://data-observability-slack.datakitchen.io/join)

*<p style="text-align: center;">DataOps Data Quality TestGen can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling,  new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. DataOps TestGen is part of DataKitchen's Open Source Data Observability.</p>*
*<p style="text-align: center;">DataOps Data Quality TestGen, or "TestGen" for short, can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. TestGen is part of DataKitchen's Open Source Data Observability.</p>*

## Features

What does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and <b>find data issues in new data</b>.
What does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and <b>find data issues in new data</b>.
<p align="center">
<img alt="DatKitchen Open Source Data Quality TestGen Features - New Data" src="https://datakitchen.io/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-2.22.57 PM.png" width="70%">
</p>
Expand Down Expand Up @@ -39,7 +39,7 @@ On Unix-based operating systems, use the following command to download it to the
curl -o dk-installer.py 'https://raw.githubusercontent.com/DataKitchen/data-observability-installer/main/dk-installer.py'
```

* Alternatively, you can manually download the [`dk-installer.py`](https://github.com/DataKitchen/data-observability-installer/blob/main/dk-installer.py) file from the [data-observability-installer](https://github.com/DataKitchen/data-observability-installer) repo.
* Alternatively, you can manually download the [`dk-installer.py`](https://github.com/DataKitchen/data-observability-installer/blob/main/dk-installer.py) file from the [data-observability-installer](https://github.com/DataKitchen/data-observability-installer) repository.
* All commands listed below should be run from the folder containing this file.
* For usage help and command options, run `python3 dk-installer.py --help` or `python3 dk-installer.py <command> --help`.

Expand All @@ -50,6 +50,7 @@ The installation downloads the latest Docker images for TestGen and deploys a ne
```shell
python3 dk-installer.py tg install
```

The `--port` option may be used to set a custom localhost port for the application (default: 8501).

To enable SSL for HTTPS support, use the `--ssl-cert-file` and `--ssl-key-file` options to specify local file paths to your SSL certificate and key files.
Expand All @@ -63,16 +64,16 @@ The [Data Observability quickstart](https://docs.datakitchen.io/articles/open-so
```shell
python3 dk-installer.py tg run-demo
```

In the TestGen UI, you will see that new data profiling and test results have been generated.

## Product Documentation

[DataOps TestGen](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help)
[DataOps Data Quality TestGen](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help)

## Useful Commands

The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yaml` files used by the installation.

The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yml` files used by the installation.

### Remove demo data

Expand All @@ -93,25 +94,29 @@ New releases of TestGen are announced on the `#releases` channel on [Data Observ
### Uninstall the application

The following command uninstalls the Docker Compose application and removes all data, containers, and images related to TestGen from your machine.

```shell
python3 dk-installer.py tg delete
```

### Access the _testgen_ CLI

The [_testgen_ command line](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-commands-and-details) can be accessed within the running container.
```

```shell
docker compose exec engine bash
```

Use `exit` to return to the regular terminal.

### Stop the application

```shell
docker compose down
```

### Restart the application

```shell
docker compose up -d
```
Expand All @@ -122,13 +127,13 @@ docker compose up -d
We recommend you start by going through the [Data Observability Overview Demo](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview).

### Support
For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and ask post on #support channel.
For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and post on the `#support` channel.

### Connect to your database
Follow [these instructions](https://docs.datakitchen.io/articles/#!dataops-testgen-help/connect-your-database) to improve the quality of data in your database.

### Community
Talk and Learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project.
Talk and learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project.

Join our community here:

Expand All @@ -150,7 +155,7 @@ Join our community here:


### Contributing
For details on contributing or running the project for development, check out our contributing guide.
For details on contributing or running the project for development, check out our [contributing guide](CONTRIBUTING.md).

### License
DataKitchen DataOps TestGen is Apache 2.0 licensed.
DataKitchen's DataOps Data Quality TestGen is Apache 2.0 licensed.
23 changes: 21 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

This document describes the environment variables supported by TestGen.

#### `TESTGEN_DEBUG_LOG_LEVEL`

Sets logs to the debug level.

default: `no`

#### `TESTGEN_DEBUG`

Invalidates the cache with the bootstrapped application causing the changes to the routing and plugins to take effect
Expand All @@ -12,9 +18,22 @@ Also, changes the logging level for the `testgen.ui` logger from `INFO` to `DEBU
default: `no`

#### `TESTGEN_LOG_TO_FILE`
Set it to `yes` to enable rotating file logs to be written under `/var/log/testgen/`.

default: `no`
Enables generation of rotating file logs.

default: `yes`

#### `TESTGEN_LOG_FILE_PATH`

File path under which to generate rotating file logs, when `TESTGEN_LOG_TO_FILE` is turned on.

default: `/var/lib/testgen/log`

#### `TESTGEN_LOG_FILE_MAX_QTY`

Maximum log files to keep (one file per day), when `TESTGEN_LOG_TO_FILE` is turned on.

default: `90`

#### `TG_DECRYPT_SALT`

Expand Down
27 changes: 23 additions & 4 deletions docs/local_development.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,14 @@ This document describes how to set up your local environment for TestGen develop

### Clone repository

Login to your GitHub account. Follow [GitHub's guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) to fork the [dataops-testgen](https://github.com/DataKitchen/dataops-testgen) repository, and clone your fork locally.
Login to your GitHub account.

Fork the [dataops-testgen](https://github.com/DataKitchen/dataops-testgen) repository, following [GitHub's guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo).

Clone your forked repository locally.
```shell
git clone https://github.com/YOUR-USERNAME/dataops-testgen
```

### Set up virtual environment

Expand All @@ -33,7 +40,10 @@ pip install -e .[dev]

# On Mac
pip install -e .'[dev]'
# [optional]
```

On Mac, you can optionally install [watchdog](https://github.com/gorakhargosh/watchdog) for better performance of the [file watcher](https://docs.streamlit.io/develop/api-reference/configuration/config.toml) used for local development.
```shell
xcode-select --install
pip install watchdog
```
Expand All @@ -42,8 +52,8 @@ pip install watchdog

Create a `local.env` file with the following environment variables, replacing the `<value>` placeholders with appropriate values. Refer to the [TestGen Configuration](configuration.md) document for other supported values.
```shell
export TESTGEN_LOG_FILE_PATH=var/lib/testgen
export TESTGEN_DEBUG=yes
export TESTGEN_LOG_TO_FILE=no
export TESTGEN_USERNAME=<username>
export TESTGEN_PASSWORD=<password>
export TG_DECRYPT_SALT=<decrypt_salt>
Expand All @@ -68,6 +78,15 @@ Initialize the application database for TestGen.
testgen setup-system-db --yes
```

Seed the demo data.
```shell
testgen quick-start --delete-target-db
testgen run-profile --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d
testgen run-test-generation --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d
testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1
testgen quick-start --simulate-fast-forward
```

### Patch and run Streamlit
Patch the Streamlit package with our custom files.
```shell
Expand All @@ -77,4 +96,4 @@ testgen ui patch-streamlit -f
Run the local Streamlit-based TestGen application. It will open the browser at [http://localhost:8501](http://localhost:8501).
```shell
testgen ui run
```
```

0 comments on commit 6145ca0

Please sign in to comment.