Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sanity.py script for sanity tests capabilities #120

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions integ-test/src/test/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Sanity Script for OpenSearch Queries

This script is designed to perform sanity checks on OpenSearch queries by executing a series of predefined queries against an OpenSearch cluster and validating the responses.

## Requirements

- Python 3.x
- `requests` library (Install using `pip install requests`)

## Configuration

Before running the script, ensure that the `s3://flint/-data/-dp/-eu/-west/-1/-beta/data/http_log/` is correctly pointing to the http_logs s3 bucket - for additional information see [data-preparation](data-preparation.md).

Before running the script, ensure that the `OPENSEARCH_URL` environment variable is set to your OpenSearch cluster's URL.

Before running the script, ensure that the datasource name (in this sample `mys3`) match the correct location of your EMR spark cluster.

Before running the script, ensure that the catalog name (in this sample `default`) match the correct schema name within the AWS-GLUE catalog.

Before running the script, ensure that the table name (in this sample `http_logs_plain`) match the correct name of table ([Or create table using the next script](./http_logs/tables/create_table.sql)).

Example:
```bash
export OPENSEARCH_URL="http://localhost:9200"
```

## Running the Script

The script can be executed with Python 3. To run the script, use the following command:

```bash
python sanity_script.py
```

You can also use the provided bash script `run_sanity.sh` to run the Python script with parameters.

```bash
./run_sanity.sh --run-tables --run-queries --use-date 20230101
```

Make sure to give execution permission to the bash script:

```bash
chmod +x run_sanity.sh
```

### Parameters

The script accepts several optional parameters to control its behavior:

- `--run-tables`: When this flag is set, the script will only execute queries related to table operations.
- `--run-queries`: This flag controls the execution of general queries that are not related to table operations.
- `--date`: A specific date can be provided in `YYYYMMDD` format to replace the `{date}` placeholder in queries.

### Examples

1. Run only table queries:
```bash
python sanity_script.py --run-tables
```

2. Run only non-table queries:
```bash
python sanity_script.py --run-queries
```

3. Run all queries with a specific date:
```bash
./run_sanity.sh --run-tables --run-queries --use-date 20231102
```
4. Run both table (creation) queries and data queries:
```bash
python sanity_script.py --run-tables --run-queries
```

## Output

The script will generate a log file with a timestamp in its name (e.g., `sanity_report_2023-11-02_12-00-00.log`) that contains the results of the sanity checks, including any errors encountered during execution.


## Support

For any queries or issues, please create an issue in the repository or contact the maintainer.
3 changes: 3 additions & 0 deletions integ-test/src/test/python/data-preparation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Data Preparation
This document will explain the data setup for an S3 bucket to perform the sanity test with.
...
1 change: 1 addition & 0 deletions integ-test/src/test/python/http_logs/ppl/ppl1.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
source = mys3.default.http_logs_plain | sort @timestamp | head 5;
2 changes: 2 additions & 0 deletions integ-test/src/test/python/http_logs/ppl/ppl2.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
source = mys3.default.http_logs_plain |
where status >= 400 | sort - @timestamp | head 5
3 changes: 3 additions & 0 deletions integ-test/src/test/python/http_logs/ppl/ppl3.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
source = mys3.default.http_logs_plain |
where status = 200 | stats count(status) by clientip, status |
sort - clientip | head 10
96 changes: 96 additions & 0 deletions integ-test/src/test/python/http_logs/results/ppl1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
{
"data": {
"ok": true,
"resp": {
"status": "SUCCESS",
"schema": [
{
"name": "@timestamp",
"type": "date"
},
{
"name": "clientip",
"type": "string"
},
{
"name": "request",
"type": "string"
},
{
"name": "status",
"type": "integer"
},
{
"name": "size",
"type": "integer"
},
{
"name": "year",
"type": "integer"
},
{
"name": "month",
"type": "integer"
},
{
"name": "day",
"type": "integer"
}
],
"datarows": [
[
"1998-06-14T19:59:55.000Z",
"185.163.25.0",
"GET /images/comp_bg2_hm.gif HTTP/1.1",
404,
340,
1998,
6,
14
],
[
"1998-06-14T19:59:55.000Z",
"161.62.26.0",
"GET /images/comp_bg2_hm.gif HTTP/1.0",
404,
343,
1998,
6,
14
],
[
"1998-06-14T19:59:55.000Z",
"63.158.15.0",
"GET /images/comp_bg2_hm.gif HTTP/1.1",
404,
335,
1998,
6,
14
],
[
"1998-06-14T19:59:55.000Z",
"190.10.13.0",
"GET /images/comp_bg2_hm.gif HTTP/1.1",
404,
335,
1998,
6,
14
],
[
"1998-06-14T19:59:53.000Z",
"28.87.6.0",
"GET /images/comp_bg2_hm.gif HTTP/1.0",
404,
349,
1998,
6,
14
]
],
"total": 5,
"size": 5
}
}
}
96 changes: 96 additions & 0 deletions integ-test/src/test/python/http_logs/results/ppl2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
{
"data": {
"ok": true,
"resp": {
"status": "SUCCESS",
"schema": [
{
"name": "@timestamp",
"type": "date"
},
{
"name": "clientip",
"type": "string"
},
{
"name": "request",
"type": "string"
},
{
"name": "status",
"type": "integer"
},
{
"name": "size",
"type": "integer"
},
{
"name": "year",
"type": "integer"
},
{
"name": "month",
"type": "integer"
},
{
"name": "day",
"type": "integer"
}
],
"datarows": [
[
"1998-06-14T19:59:55.000Z",
"185.163.25.0",
"GET /images/comp_bg2_hm.gif HTTP/1.1",
404,
340,
1998,
6,
14
],
[
"1998-06-14T19:59:55.000Z",
"161.62.26.0",
"GET /images/comp_bg2_hm.gif HTTP/1.0",
404,
343,
1998,
6,
14
],
[
"1998-06-14T19:59:55.000Z",
"63.158.15.0",
"GET /images/comp_bg2_hm.gif HTTP/1.1",
404,
335,
1998,
6,
14
],
[
"1998-06-14T19:59:55.000Z",
"190.10.13.0",
"GET /images/comp_bg2_hm.gif HTTP/1.1",
404,
335,
1998,
6,
14
],
[
"1998-06-14T19:59:53.000Z",
"28.87.6.0",
"GET /images/comp_bg2_hm.gif HTTP/1.0",
404,
349,
1998,
6,
14
]
],
"total": 5,
"size": 5
}
}
}
76 changes: 76 additions & 0 deletions integ-test/src/test/python/http_logs/results/ppl3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
{
"data": {
"ok": true,
"resp": {
"status": "SUCCESS",
"schema": [
{
"name": "count(status)",
"type": "long"
},
{
"name": "clientip",
"type": "string"
},
{
"name": "status",
"type": "integer"
}
],
"datarows": [
[
78,
"99.99.9.0",
200
],
[
133,
"99.99.8.0",
200
],
[
542,
"99.99.6.0",
200
],
[
15,
"99.99.5.0",
200
],
[
4,
"99.99.4.0",
200
],
[
71,
"99.99.3.0",
200
],
[
143,
"99.99.20.0",
200
],
[
39,
"99.99.2.0",
200
],
[
156,
"99.99.19.0",
200
],
[
64,
"99.99.18.0",
200
]
],
"total": 10,
"size": 10
}
}
}
Loading
Loading