Data testing, monitoring, and profiling for SQL-accessible data.
✔ Install from the command-line
✔ Access comprehensive documentation
✔ Compatible with Snowflake, Amazon Redshift, BigQuery, and more
✔ Write tests in a YAML file
✔ Run programmatic scans to test data quality
Got 5 minutes? Try the interactive demo!
table_name: breakdowns
metrics:
- row_count
- missing_count
- missing_percentage
...
# Validates that a table has rows
tests:
- row_count > 0
# Tests that numbers in the column are entered in a valid format as whole numbers
columns:
incident_number:
valid_format: number_whole
tests:
- invalid_percentage == 0
# Tests that no values in the column are missing
school_year:
tests:
- missing_count == 0
# Tests for duplicates in a column
bus_no:
tests:
- duplicate_count == 0
# Compares row count between datasets
sql_metric:
sql: |
SELECT COUNT(*) as other_row_count
FROM other_table
tests:
- row_count == other_row_count
Thanks goes to these wonderful people! (emoji key)
This project follows the all-contributors specification. Contributions of any kind are welcome!
Soda-sql collects statistical usage and performance information via the Open Telemetry framework to help the Soda Core developers team proactively track performance issues and understand how users interact with the tool. The information is strictly limited to usage and performance and does not contain Personal Identifying Information. It will be used for internal purposes only. Soda will keep the data in its raw form for a maximum of 5 years. If some information needs to be kept for longer, it will be done in aggregated form only.
Users can find more information about the tracked information, and opt-out of tracking by consulting the reference section of docs.soda.io