GitHub - hackstrap/soda-sql: Data profiling, testing, and monitoring for SQL accessible data.

Data testing, monitoring, and profiling for SQL-accessible data.

✔ Install from the command-line

✔ Access comprehensive documentation

✔ Compatible with Snowflake, Amazon Redshift, BigQuery, and more

✔ Write tests in a YAML file

✔ Run programmatic scans to test data quality

Got 5 minutes? Try the interactive demo!

Example scan YAML file

table_name: breakdowns
metrics:
  - row_count
  - missing_count
  - missing_percentage
...
# Validates that a table has rows
tests:
  - row_count > 0

# Tests that numbers in the column are entered in a valid format as whole numbers
columns:
  incident_number:
    valid_format: number_whole
    tests:
      - invalid_percentage == 0

# Tests that no values in the column are missing
  school_year:
    tests:
      - missing_count == 0

# Tests for duplicates in a column
  bus_no:
    tests:
      - duplicate_count == 0

# Compares row count between datasets
sql_metric: 
  sql: |
    SELECT COUNT(*) as other_row_count
    FROM other_table
  tests:
    - row_count == other_row_count

Play

Install

Collaborate

Contributors ✨

Thanks goes to these wonderful people! (emoji key)

_{Vijay Kiran} 💻	_{abhishek khare} 💻	_{Jelte Hoekstra} 💻 📖	_Cor 💻 📖	_{Milan Aleksić} 🚇	_{Ayoub Fakir} 💻	_{Alex Tonkonozhenko} 💻
_{Todd de Quincey} 💻	_{Antonin Jousson} 💻	_Jonas 🚇	_cwouter 💻	_{Janet R} 📖	_{Bastien Boutonnet} 💻	_{Tom Baeyens} 💻
_{AlessandroLollo} 💻	_mmigdiso 💻	_ericmuijs 💻	_{Lieven Govaerts} 💻	_{Milan Lukac} 💻

This project follows the all-contributors specification. Contributions of any kind are welcome!

Open Telemetry Tracking

Soda-sql collects statistical usage and performance information via the Open Telemetry framework to help the Soda Core developers team proactively track performance issues and understand how users interact with the tool. The information is strictly limited to usage and performance and does not contain Personal Identifying Information. It will be used for internal purposes only. Soda will keep the data in its raw form for a maximum of 5 years. If some information needs to be kept for longer, it will be done in aggregated form only.

Users can find more information about the tracked information, and opt-out of tracking by consulting the reference section of docs.soda.io

Name		Name	Last commit message	Last commit date
Latest commit History 758 Commits
.github		.github
core		core
docs		docs
examples		examples
packages		packages
reports		reports
scripts		scripts
tests		tests
.all-contributorsrc		.all-contributorsrc
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
dev-requirements.in		dev-requirements.in
dev-requirements.txt		dev-requirements.txt
docker-compose-arm.yml		docker-compose-arm.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
tbump.toml		tbump.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example scan YAML file

Play

Install

Collaborate

Contributors ✨

Open Telemetry Tracking

About

Releases

Packages

Languages

License

hackstrap/soda-sql

Folders and files

Latest commit

History

Repository files navigation

Example scan YAML file

Play

Install

Collaborate

Contributors ✨

Open Telemetry Tracking

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages