Skip to content

Basic SQL Overview

Missy Schoenbaum edited this page Jul 30, 2018 · 10 revisions

Why should I access my output data using SQL with SQLite Studio?

ADSM uses SQLite, a database management software, throughout the entire simulation to keep the input and output datasets in working order. At the end of a simulation, ADSM provides you with a number of visual representations of the output data. However, there exists a wealth of data that remain not visualized within ADSM. In order to access the full output data, SQL (structured query language) must be used, as it is where the output data exist within ADSM. Once you access the full results using SQL, you can execute queries to pare down the data to the specifications you want. Then you will be able to export the data as a smaller and more manageable CSV file. Step-by-step directions for how to access, manage, and export ADSM output data using the tool SQLite Studio can be found here. SQLITE instructions The SQLITEStudio download can be found at SQLITEStudio.

The ADSM SQLite database contains the same fields for every scenario. This allows users to write one query, and easily run it in any database.

A helpful hint: the characters “- -“allow a comment to be inserted into the text that won’t be parsed by the query. Note that Microsoft Word may auto convert these characters into a single dash in this document, but SQL is expecting 2 separate dashes.

SYNTAX – Simple Select

SELECT <Column_name_1, Column_name_2, Column_name_3 >

FROM  <table_name>

WHERE <search_conditions>

ORDER BY <Column_name_1, Column_name_2, Column_name_3 >asc, desc

SELECT specifies the data or table columns desires

FROM specifies the source table name from where to retrieve data

WHERE specifies the search conditions that must be met

ORDER BY specifies the sort order, and can be ascending or descending

A simple way to get a first look at a table’s content is

SELECT * FROM <table_name>

ADSM Example

SELECT * FROM Results_DailyByProductionType

Notice that returning all 223 columns in no particular order is chaotic. Using one of the queries in the Example Database Queries directory can be a first step towards an organized approach.   Here’s a smaller subset

SELECT --*

iteration, 

day, 

last_day, 

Production_type_id,  -- this is not easy to understand, we will fix later

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM Results_DailyByProductionType

Even with a smaller subset, there is still some possibilities for improvement. Results_DailyByProductionType is written by day, by production type, as the name implies. Production_type_id is NULL (empty) on some records, and has values on other records. Where Production_tpe_id = null is the record that summarizes of the production types. We can use the WHERE clause to clarify further.

SELECT

iteration, 

day, 

last_day, 

Production_type_id, -- this is not easy to understand, fix later

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM Results_DailyByProductionType

WHERE 1=1 --  using a clause here that evaluate to TRUE makes query writing easier

AND Production_type_id is null

The order of data in a database can never be assumed. Data is written into the tables in the order the simulations finish, so iteration 1 may have been a longer duration while iteration 2 may have been shorter, was completed first and saved into the database

Using ORDER BY to force a sort on your data

SELECT

iteration, 

day, 

last_day, 

Production_type_id, -- this is not easy to understand, fix later

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM Results_DailyByProductionType
WHERE 1=1 		-- using a clause here that evaluate to TRUE makes query writing easier

AND Production_type_id is null

ORDER BY Iteration, Day desc -- I can also simplify by using the number of the column eg. 1,2,3

Ascending order is the default, use desc to indicate a descending order. Using a WHERE, the dataset can be limited. The dataset is returning every day, and the final day is desired.

SELECT

iteration, 

day, 

last_day, 

Production_type_id, -- this is not easy to understand, fix later

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM Results_DailyByProductionType

WHERE 1=1 -- using a clause that evaluate to TRUE makes query writing easier

AND Production_type_id is null

AND last_day <> 0 --  this could be also be done as last_day = 1 or last_day !=0

ORDER BY 1, 2 desc

The Production_type_id is not very useful as a number. Let’s resolved that with a table join.

SYNTAX – Simple Select

SELECT <Column_name_1, Column_name_2, Column_name_3 >
FROM  <table_name>
JOIN <table_name>
ON <table_name>.column_name = <table_name>.column_name
WHERE <search_conditions>
ORDER BY <Column_name_1, Column_name_2, Column_name_3 > asc, desc

A helpful hint – An alias or rename can be used, both at the table level and the field level to clarify data fields and shorten the amount of SQL needed.

A CASE statement is used like an IF THEN ELSE. Using the JOIN, the textual description of the production type can be pulled in instead of the numeric production_type_id. Recall that the The CASE allows changing the name of a NULL column to a value that makes more sense for common reading. Recall that Production_tpe_id is NULL on the record that summarizes all of the production types.

SELECT

iteration, 

day, 

last_day, 

Production_type_id, -- this is not easy to understand, fix in next line

CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM  Results_dailybyproductiontype r -- alias to r

LEFT JOIN    -- left join here because the NULL production type indicates ALL, note case statement to manage this

ScenarioCreator_productiontype pt  -- Alias to pt to use in next step

ON r.production_type_id = pt.id

WHERE 1=1

AND r.ROWID < 100 --This where clause can return a limited set of data

ORDER BY 1 desc, 2 asc

Helpful hint – When a join happens, and both tables have fields with the same name, the database may gives an error about names being ambiguous. This can be quickly resolved by giving an alias to clarify which field comes from which table. Another method that can be used to deal with an unwanted NULL value is IFNULL().

SELECT

r.id as results_Identifier,

pt.id as ProductionType_Identifier,

iteration, 

day, 

last_day, 

Production_type_id, -- this is not easy to understand, fix in next line

CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,

IFNUll(Production_type_id, "CompliedResults") as anotherway,

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM  Results_dailybyproductiontype r -- alias to r

LEFT JOIN  -- left join here because NULL production type indicates ALL, note case statement to manage this

ScenarioCreator_productiontype pt  -- Alias to pt to use in next step

ON r.production_type_id = pt.id

WHERE 1=1

AND r.ROWID < 100 --This where clause can return a limited set of data more quickly

ORDER BY 1 desc, 2 asc

The CASE command is a flexible tool. Case can categorize a set of values into another value.

SELECT

iteration, 

day, 

last_day, 

Production_type_id, -- this is not easy to understand, fix in next line

CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,

IFNUll(Production_type_id, "CompiledResults") as anotherway,

CASE day WHEN 1 THEN "First" 

WHEN 2 THEN "Second" 

WHEN 3 THEN "Third" 

ELSE "OtherDays" 

END as exampleDecode,

CASE WHEN day > 5 THEN day+1

WHEN day = 1 THEN "First"

WHEN day = 2 THEN day

WHEN day = 3 THEN "Third" 

ELSE day

END as differentdecode,

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM  Results_dailybyproductiontype r -- alias to r

LEFT JOIN

ScenarioCreator_productiontype pt  -- Alias to pt to use in next step

ON r.production_type_id = pt.id

WHERE 1=1

AND r.ROWID < 100 – This where clause can return a limited set of data more quickly

ORDER BY 1 desc, 2 asc

Here’s an example using a subselect or nested query with an equal join. Using just JOIN attaches only where records match. This example also has an aggregate (COUNT) and a GROUP BY to describe how the aggregation needs to be made.

SELECT

iteration, 

day, 

Totaldays,

last_day, 

CASE WHEN name IS NULL THEN "ALL" ELSE name END as productiontype,

expcU, 

expcUAir, 

expcUDir, 

expcUInd

FROM Results_dailybyproductiontype r 

LEFT JOIN -- left join here because the NULL production type indicates ALL

ScenarioCreator_productiontype pt

ON r.production_type_id = pt.id

JOIN (

Select iteration, 

Count(day) TotalDays

from Results_dailybyproductiontype

GROUP BY iteration) total

ON r.iteration = total.iteration

WHERE 1=1

AND Production_type_id is null

ORDER BY 1, 2

Examples of complex aggregate queries

SELECT

DC.iteration,  

detectionlastday, 

day, -- last_day,

diseaseduration, 

outbreakduration, 

diseaseended,  

zonefocicreated,

deswUMax, 

deswUMaxDay,

deswUTimeMax, 

deswUTimeAvg,

deswUDaysInQueue, 

deswAMax,

deswAMaxDay, 

deswATimeMax,

deswATimeAvg , 

deswADaysInQueue

FROM results_dailycontrols DC

JOIN

(SELECT iteration, MAX(last_day) as diseaseended, MAX(lastdetection) AS detectionlastday

FROM results_dailybyproductiontype

WHERE production_type_id IS NULL

GROUP BY iteration ) AS D 

ON DC.iteration = D.iteration

LEFT JOIN

(SELECT  iteration, CASE WHEN SUM(zonearea) IS 0  THEN  0 ELSE  1 END as zonefocicreated

FROM results_dailybyzone WHERE last_day = 1 GROUP BY iteration ) AS z

ON DC.iteration = Z.iteration

WHERE last_day = 1 -- THIS NEEDS TO STAY IN PLACE, ADD AN AND IF YOU WANT TO LIMIT TO ONE ITERATION

ORDER BY 1

Example of zone queries

SELECT iteration, day, last_day,
ifnull(production_type_id, 'ALL') as production_type,
--ifnull(zone_id, 'NOZONE'), 
ifnull(z.name||' ' ||radius, 'No Zone') as zone_name, 
unitsinzone, unitdaysinzone,animaldaysinzone
FROM results_dailybyzoneandproductiontype d
LEFT JOIN scenarioCreator_zone z on d.zone_id = z.id -- left join needed to manage nulls!
LEFT JOIN scenarioCreator_productiontype p on p.id = d.production_type_id -- left join to manage nulls!
WHERE 1=1 AND iteration = 1
AND production_type_id is null
ORDER BY 2,3,4,5

Epi Curve across all iterations

SELECT day, Sum(infnU), Sum(detnU)

FROM Results_dailybyproductiontype r

WHERE 1=1 

AND production_type_id is null

AND last_day < 1

GROUP BY day

ORDER BY 1, 2

More examples are provided in the Example Database Queries directory that was installed with ADSM.

Clone this wiki locally