You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
should create a framework for checks of the merged data before running the reports which logs to the stdout, but also generates an html report table that helps identify the error
the report tables should be something like this, maybe with a high level and a low level version
test description
expected result
actual result
pass/fail
priorirty
0
check that input data columns match
expected columns should be found
expected rred_example_column not found
FAIL
HIGH
1
check that no extra columns are present
no novel columns should exist
unexpected column RRED_example_column present
PASS
pupil_no
school
field
issue
0
AS82827_1
Walden Road Primary
date_of_birth
improbable value of 2022-01-18
1
JK92817_2
Bigginson Primary School
exit_date
missing value
May be able to just use pandas with an html template rather than messing aroung with jinja2 templating. Prototype which we would make more production ready but gives an idea
importpandasaspdtop_level_checks=pd.DataFrame({
"test description": [
"check that input data columns match",
"check that no extra columns are present"
],
"expected result": [
"expected columns should be found",
"no novel columns should exist"
],
"actual result": [
"expected `rred_example_column` not found",
"unexpected column `RRED_example_column` present"
],
"pass/fail": [
"FAIL",
"PASS"
],
"priorirty": [
"HIGH",
""
]
})
low_level_checks=pd.DataFrame({
"pupil_no": [
"AS82827_1",
"JK92817_2",
],
"school": [
"Walden Road Primary",
"Bigginson Primary School"
],
"field": [
"date_of_birth",
"exit_date"
],
"issue": [
"improbable value of `2022-01-18`",
"missing value"
],
})
html_template="""<!doctype html><html lang="en"> <head> <!-- Required meta tags --> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <!-- Bootstrap CSS --> <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> <title>{header}</title> </head> <body> <h1>Top level issues</h1> {top_level} <h1>Low level issues</h1> {low_level} </body></html>."""withopen("text.html", "w") ashandle:
handle.write(html_template.format(
header="RRED UAT report",
top_level=top_level_checks.to_html(classes="table table-striped"),
low_level=low_level_checks.to_html(classes="table table-striped")
))
The text was updated successfully, but these errors were encountered:
stefpiatek
changed the title
Data validation test with tests cases
Framework for reporting data issues
Mar 1, 2023
should create a framework for checks of the merged data before running the reports which logs to the stdout, but also generates an html report table that helps identify the error
the report tables should be something like this, maybe with a high level and a low level version
rred_example_column
not foundRRED_example_column
present2022-01-18
May be able to just use pandas with an html template rather than messing aroung with jinja2 templating. Prototype which we would make more production ready but gives an idea
The text was updated successfully, but these errors were encountered: