-
Notifications
You must be signed in to change notification settings - Fork 5
Basic SQL Overview
Why should I access my output data using SQL with SQLite Studio?
ADSM uses SQLite, a database management software, throughout the entire simulation to keep the input and output datasets in working order. At the end of a simulation, ADSM provides you with a number of visual representations of the output data. However, there exists a wealth of data that remain not visualized within ADSM. In order to access the full output data, SQL (structured query language) must be used, as it is where the output data exist within ADSM. Once you access the full results using SQL, you can execute queries to pare down the data to the specifications you want. Then you will be able to export the data as a smaller and more manageable CSV file. Step-by-step directions for how to access, manage, and export ADSM output data using the tool SQLite Studio can be found here. SQLITE instructions The SQLITEStudio download can be found at SQLITEStudio.
The ADSM SQLite database contains the same fields for every scenario. This allows users to write one query, and easily run it in any database.
A helpful hint: the characters “- -“allow a comment to be inserted into the text that won’t be parsed by the query. Note that Microsoft Word may auto convert these characters into a single dash in this document, but SQL is expecting 2 separate dashes.
SYNTAX – Simple Select
SELECT <Column_name_1, Column_name_2, Column_name_3 >
FROM <table_name>
WHERE <search_conditions>
ORDER BY <Column_name_1, Column_name_2, Column_name_3 >asc, desc
SELECT specifies the data or table columns desires
FROM specifies the source table name from where to retrieve data
WHERE specifies the search conditions that must be met
ORDER BY specifies the sort order, and can be ascending or descending
A simple way to get a first look at a table’s content is
SELECT * FROM <table_name>
ADSM Example
SELECT * FROM Results_DailyByProductionType
Notice that returning all 223 columns in no particular order is chaotic. Using one of the queries in the Example Database Queries directory can be a first step towards an organized approach. Here’s a smaller subset
SELECT --*
iteration,
day,
last_day,
Production_type_id, -- this is not easy to understand, we will fix later
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_DailyByProductionType
Even with a smaller subset, there is still some possibilities for improvement. Results_DailyByProductionType is written by day, by production type, as the name implies. Production_type_id is NULL (empty) on some records, and has values on other records. Where Production_tpe_id = null is the record that summarizes of the production types. We can use the WHERE clause to clarify further.
SELECT
iteration,
day,
last_day,
Production_type_id, -- this is not easy to understand, fix later
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_DailyByProductionType
WHERE 1=1 -- using a clause here that evaluate to TRUE makes query writing easier
AND Production_type_id is null
The order of data in a database can never be assumed. Data is written into the tables in the order the simulations finish, so iteration 1 may have been a longer duration while iteration 2 may have been shorter, was completed first and saved into the database
Using ORDER BY to force a sort on your data
SELECT
iteration,
day,
last_day,
Production_type_id, -- this is not easy to understand, fix later
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_DailyByProductionType
WHERE 1=1 -- using a clause here that evaluate to TRUE makes query writing easier
AND Production_type_id is null
ORDER BY Iteration, Day desc -- I can also simplify by using the number of the column eg. 1,2,3
Ascending order is the default, use desc to indicate a descending order. Using a WHERE, the dataset can be limited. The dataset is returning every day, and the final day is desired.
SELECT
iteration,
day,
last_day,
Production_type_id, -- this is not easy to understand, fix later
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_DailyByProductionType
WHERE 1=1 -- using a clause that evaluate to TRUE makes query writing easier
AND Production_type_id is null
AND last_day <> 0 -- this could be also be done as last_day = 1 or last_day !=0
ORDER BY 1, 2 desc
The Production_type_id is not very useful as a number. Let’s resolved that with a table join.
SYNTAX – Simple Select
SELECT <Column_name_1, Column_name_2, Column_name_3 >
FROM <table_name>
JOIN <table_name>
ON <table_name>.column_name = <table_name>.column_name
WHERE <search_conditions>
ORDER BY <Column_name_1, Column_name_2, Column_name_3 > asc, desc
A helpful hint – An alias or rename can be used, both at the table level and the field level to clarify data fields and shorten the amount of SQL needed.
A CASE statement is used like an IF THEN ELSE. Using the JOIN, the textual description of the production type can be pulled in instead of the numeric production_type_id. Recall that the The CASE allows changing the name of a NULL column to a value that makes more sense for common reading. Recall that Production_tpe_id is NULL on the record that summarizes all of the production types.
SELECT
iteration,
day,
last_day,
Production_type_id, -- this is not easy to understand, fix in next line
CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_dailybyproductiontype r -- alias to r
LEFT JOIN -- left join here because the NULL production type indicates ALL, note case statement to manage this
ScenarioCreator_productiontype pt -- Alias to pt to use in next step
ON r.production_type_id = pt.id
WHERE 1=1
AND r.ROWID < 100 --This where clause can return a limited set of data
ORDER BY 1 desc, 2 asc
Helpful hint – When a join happens, and both tables have fields with the same name, the database may gives an error about names being ambiguous. This can be quickly resolved by giving an alias to clarify which field comes from which table. Another method that can be used to deal with an unwanted NULL value is IFNULL().
SELECT
r.id as results_Identifier,
pt.id as ProductionType_Identifier,
iteration,
day,
last_day,
Production_type_id, -- this is not easy to understand, fix in next line
CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,
IFNUll(Production_type_id, "CompliedResults") as anotherway,
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_dailybyproductiontype r -- alias to r
LEFT JOIN -- left join here because NULL production type indicates ALL, note case statement to manage this
ScenarioCreator_productiontype pt -- Alias to pt to use in next step
ON r.production_type_id = pt.id
WHERE 1=1
AND r.ROWID < 100 --This where clause can return a limited set of data more quickly
ORDER BY 1 desc, 2 asc
The CASE command is a flexible tool. Case can categorize a set of values into another value.
SELECT
iteration,
day,
last_day,
Production_type_id, -- this is not easy to understand, fix in next line
CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,
IFNUll(Production_type_id, "CompiledResults") as anotherway,
CASE day WHEN 1 THEN "First"
WHEN 2 THEN "Second"
WHEN 3 THEN "Third"
ELSE "OtherDays"
END as exampleDecode,
CASE WHEN day > 5 THEN day+1
WHEN day = 1 THEN "First"
WHEN day = 2 THEN day
WHEN day = 3 THEN "Third"
ELSE day
END as differentdecode,
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_dailybyproductiontype r -- alias to r
LEFT JOIN
ScenarioCreator_productiontype pt -- Alias to pt to use in next step
ON r.production_type_id = pt.id
WHERE 1=1
AND r.ROWID < 100 – This where clause can return a limited set of data more quickly
ORDER BY 1 desc, 2 asc
Here’s an example using a subselect or nested query with an equal join. Using just JOIN attaches only where records match. This example also has an aggregate (COUNT) and a GROUP BY to describe how the aggregation needs to be made.
SELECT
iteration,
day,
Totaldays,
last_day,
CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,
expcU,
expcUAir,
expcUDir,
expcUInd
FROM Results_dailybyproductiontype r
LEFT JOIN -- left join here because the NULL production type indicates ALL
ScenarioCreator_productiontype pt
ON r.production_type_id = pt.id
JOIN (
Select iteration,
Count(day) TotalDays
from Results_dailybyproductiontype
GROUP BY iteration) total
ON r.iteration = total.iteration
WHERE 1=1
AND Production_type_id is null
ORDER BY 1, 2
Examples of complex aggregate queries
SELECT
DC.iteration,
detectionlastday,
day, -- last_day,
diseaseduration,
outbreakduration,
diseaseended,
zonefocicreated,
deswUMax,
deswUMaxDay,
deswUTimeMax,
deswUTimeAvg,
deswUDaysInQueue,
deswAMax,
deswAMaxDay,
deswATimeMax,
deswATimeAvg ,
deswADaysInQueue
FROM results_dailycontrols DC
JOIN
(SELECT iteration, MAX(last_day) as diseaseended, MAX(lastdetection) AS detectionlastday
FROM results_dailybyproductiontype
WHERE production_type_id IS NULL
GROUP BY iteration ) AS D
ON DC.iteration = D.iteration
LEFT JOIN
(SELECT iteration, CASE WHEN SUM(zonearea) IS 0 THEN 0 ELSE 1 END as zonefocicreated
FROM results_dailybyzone WHERE last_day = 1 GROUP BY iteration ) AS z
ON DC.iteration = Z.iteration
WHERE last_day = 1 -- THIS NEEDS TO STAY IN PLACE, ADD AN AND IF YOU WANT TO LIMIT TO ONE ITERATION
ORDER BY 1
Example of zone queries
SELECT iteration, day, last_day,
ifnull(production_type_id, 'ALL') as production_type,
--ifnull(zone_id, 'NOZONE'),
ifnull(z.name||' ' ||radius, 'No Zone') as zone_name,
unitsinzone, unitdaysinzone,animaldaysinzone
FROM results_dailybyzoneandproductiontype d
LEFT JOIN scenarioCreator_zone z on d.zone_id = z.id -- left join needed to manage nulls!
LEFT JOIN scenarioCreator_productiontype p on p.id = d.production_type_id -- left join to manage nulls!
WHERE 1=1 AND iteration = 1
AND production_type_id is null
ORDER BY 2,3,4,5
Epi Curve across all iterations
SELECT day, Sum(infnU), Sum(detnU)
FROM Results_dailybyproductiontype r
WHERE 1=1
AND production_type_id is null
AND last_day < 1
GROUP BY day
ORDER BY 1, 2
More examples are provided in the Example Database Queries directory that was installed with ADSM.