-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sql queries : case aggregation #11
Comments
I suggest creating a view on the database like this (untested code): CREATE VIEW casos as SELECT
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND classi_fin <> '') AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita <= {lastday} AND dt_digita >={firstday} AND municipio_geocodigo IN ({sqlcity}) AND cid10_codigo IN({sqlcid}); @claudia-codeco, is this the query you need? |
almost there. we need to group_by epiweek using dt_sin_pri |
It seems that first creating this aggregation by @luabida can you finalize this, please? |
Yes, the final table should have the following columns municipio_geocodigo |
So it seems that the view has to be something like this: CREATE VIEW casos as SELECT
municipio_geocodigo,
year,
SE,
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND classi_fin <> '') AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita <= {lastday} AND dt_digita >={firstday} AND municipio_geocodigo IN ({sqlcity}) AND cid10_codigo IN({sqlcid})
GROUP by municipio_geocodigo, year, SE; |
@fccoelho @claudia-codeco this is the result from the query with minor adjustments in the column names: SELECT
municipio_geocodigo,
ano_notif,
se_notif,
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5) AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita BETWEEN '2023-01-01' AND '2023-01-03'
AND municipio_geocodigo IN (3304557)
AND cid10_codigo IN ('A90')
GROUP by municipio_geocodigo, ano_notif, se_notif; The only thing I didn't quite understand is the purpose of having the column
|
There is only one issue: we cannot use So unless we find a way to calculate the epiweek using SQL, we may need to call a pl-python function in this view. The definition of EpiWeek is this (according to PAHO):
|
Adding the epiweek in copernicus table is important to allow for grouping by epiweek. |
I tried to keep the exact logic as the SQL Function with plpython3u: CREATE OR REPLACE FUNCTION extract_SE(date DATE)
RETURNS INT AS $$
from datetime import date as dt
def _system_adjustment(system: str) -> int:
systems = ("iso", "cdc") # Monday, Sunday
return systems.index(system.lower())
def _year_start(year: int, system: str) -> int:
adjustment = _system_adjustment(system)
mid_weekday = 3 - adjustment # Sun is 6 .. Mon is 0
jan1 = dt(year, 1, 1)
jan1_ordinal = jan1.toordinal()
jan1_weekday = jan1.weekday()
week1_start_ordinal = jan1_ordinal - jan1_weekday - adjustment
if jan1_weekday > mid_weekday:
week1_start_ordinal += 7
return week1_start_ordinal
def fromdate(date: dt, system: str = "cdc") -> int:
if isinstance(date, str):
date = dt.fromisoformat(date)
year = date.year
date_ordinal = date.toordinal()
year_start_ordinal = _year_start(year, system)
week = (date_ordinal - year_start_ordinal) // 7
if week < 0:
year -= 1
year_start_ordinal = _year_start(year, system)
week = (date_ordinal - year_start_ordinal) // 7
elif week >= 52:
year_start_ordinal = _year_start(year + 1, system)
if date_ordinal >= year_start_ordinal:
year += 1
week = 0
week += 1
return int(str(year) + f"{week:02d}")
return fromdate(date, "cdc")
$$ LANGUAGE plpython3u; Updated query: SELECT
municipio_geocodigo,
ano_notif,
extract_SE(dt_sin_pri) as SE,
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5) AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita BETWEEN '2023-01-01' AND '2023-01-03'
AND municipio_geocodigo IN (3304557)
AND cid10_codigo IN ('A90')
GROUP by municipio_geocodigo, ano_notif, extract_SE(dt_sin_pri); |
We should add this python function to the setup script for the infodengue database so that we always have access to it. in case of a redeploy somewhere. |
Problem
Currently, line list of cases are queried and then aggregated. In order to include group_by in the sql query, we need to
compute the epiweek.
Why do that
To optimize consultation, this is also useful for the webpage
current code (AlertTools::getCases.R)
The text was updated successfully, but these errors were encountered: