This file explains how to use the Python module and the CLI as well as the intended use cases for them.
Basically, PyOdourCollect is a tool for advanced Citizen Scientists and/or data analysts that want to obtain useful information regarding odour pollution episodes in a given area of interest. Nothing precludes any user from downloading all the observations to date in OdourCollect platform, but the most insightful analysis can only be performed when data is conveniently filtered by criteria like odour type and distance from a suspicious, previously identified point of interest.
If you previously want to know if the citizens have provided odour observations in a given area of your interest, the best way to know is visiting OdourCollect.eu directly and taking a look at the map. OdourCollect shares all the collected data for free, the same way the Citizen Scientists worldwide provide it.
If you find out that there is not odur observation data for the zone you wanted to analyse, you can always lead a local movement in your community and start recruiting neighbours, so they download the OdourCollect APP on their phones and start mapping odour observations!
You can get pyodourcollect from PyPi:
pip install pyodourcollect
The command line interface (CLI) tool provided simplifies downloading data from OdourCollect in CSV or XLSX formats and is the to-go choice if you plan to analyze the data:
- Manually
- With Microsoft's Excel or PowerBI
- With other data analysis tools that can import CSV or XLSX files. For example, Orange data mining (which we strongly recommend!)
On the other hand, the bare Python module is a more suitable choice if you are a more advanced scientist planning to:
- Put OdourCollect's data in a Pandas DataFrame to analyse and visualise it using your own code for that purpose.
- Integrate OdourCollect's data into a Python program of yours. In that cases, PyOdourCollect will provide you with the data you need in the simpliest way: a Pandas DataFrame.
Wether you use the CLI to download OdourCollect's data in CSV/XLSX format, or you use the Python module directly to obtain a Pandas DataFrame, the structure of the data is the same:
field | description |
---|---|
user | OdourCollect's user ID of the citizen that registered the observation. |
date | Observation date in yyyy-mm-dd format. |
time | Observation time in HH:mm (24h) format, UTC timezone. |
week_day | Observation day of week. This field is extra data calculated by PyOdourCollect to help the analyst in finding patterns. Please bear in mind that this calculation is based on UTC, not local time, so it could be misleading in some edge cases. |
category | First tier of odour classification. In OdourCollect webapp, this is called "type". It provides complementary classification nuances that can be safely ignored for basic analysis. See the full table below for better understanding. |
type | Second tier of odour classification. In OdourCollect webapp, this is called "subtype". It provides the richest odour classification criteria. See the full table below for better understanding. |
hedonic_tone_n | Hedonic tone of odour observation (numeric representation). Hedonic tone is the subjective measurement of how annoyant an odour is, from -4 (Extremely unpleasant ) to +4 (Extremely pleasant ). Zero is used to report nor annoyance nor pleasure. This scale is based on the VDI3882-1 and VDI3882-2 standards for odour impact assessement. |
hedonic_tone_t | Text description version of the former metric. |
intensity_n | Intensity of odour observation (numeric representation). Intensity is the measurement of how intense and noticeable an odour is, from 1 (Very weak ) to 6 (Extremely strong ). Zero (Not perceptible ) is also used, but only to report absence of odour in observations. This scale is based on the VDI 3940:2006 standard for odour impact assessement. |
intensity_t | Text description version of the former metric. |
duration | Metric informing for how much time an odour has been perceived by reporter. Categorical text data with following self-explanatory options: (No odour) ,Punctual ,Continuous in the last hour and Continuous throughout the day |
latitude | GPS coordinates of observation. Latitude. |
longitude | GPS coordinates of observation. Longitude. |
distance | Distance in Kms (with an accuracy of 0.01 Kms.) between the point of observation and a configurable Point of Interest (POI). This extra data is calculated by PyOdourCollect when the data analyst provides a set of coordinates for a given suspicious activity that motivates his/her analysis. In case that no POI coordinates are provided, this field is missing. |
OdourCollect.eu provides some additional data regarding odour observations. However, this data is not available without performing one query to OdourCollect API per each observation. That would become much less practical and could put OdourCollect.eu server in jeopardy, so it will not be implemented. The OdourCollect data that is not going to be provided by PyOdourCollect is:
- Written comments made by the citizen regarding the odour itself or hypothesis about its possible origin at the time of observation.
- Full address that was inferred at the time of observation (it can be inferred from GPS coordinates anyway).
This module includes a command line interface tool that is installed as odourcollect
(GNU/Linux, MacOS) or odourcollect.exe
(Windows).
You can simply run the CLI with no parameters. It will automatically download all data from OdourCollect into a odourcollect.csv
.
Now you are ready to start analysing the data with you tool of choice.
However, if you prefer to prefilter data or enrich it with some additional fields not provided by OdourCollect, keep reading.
This is the full help and usage output of the tool:
usage: OdourCollect [-h] [--startdate STARTDATE] [--enddate ENDDATE] [--category CATEGORY] [--odourtype ODOURTYPE]
[--hedonic [{pleasant,unpleasant,neutral,all}]] [--minintensity MININTENSITY] [--maxintensity MAXINTENSITY] [--gps lat long]
[--output path to output file] [--odourlist]
pyOdourCollect: A tool that obtains odour observations made by citizens at odourcollect.eu Citizen Obsevatory.
optional arguments:
-h, --help show this help message and exit
Observation filter options:
--startdate STARTDATE, -s STARTDATE
Earliest date (yyyy-mm-dd).
--enddate ENDDATE, -e ENDDATE
Latest date (yyyy-mm-dd).
--category CATEGORY, -c CATEGORY
Category of odours (known as 'type' in OdourCollect) From 1 to 9. 0 means all.
--odourtype ODOURTYPE, -t ODOURTYPE
Type of odour (known as 'subtype' in OdourCollect). From 1 to 89. 0 means all.
--hedonic [{pleasant,unpleasant,neutral,all}]
Hedonic tones. "unpleasant" (from -1 to -4), "pleasant" (from +1 to +4), "neutral" (tone zero only) or "all" (get everything,
default
--minintensity MININTENSITY
Minimum intensity of odour, from 0 to 6.
--maxintensity MAXINTENSITY
Maximum intensity of odour, from 0 to 6.
Analysis options:
--gps lat long If specified, observations will include the distance in Km. from the specified GPS coordinates. Useful to analize data against a
suspicious point of odour emission in a specific area of interest. Takes two arguments in the form of decimal numbers for lat/long
gps coordenates i.e.: --gps 41.409032 2.222619
Output options:
--output path to output file
Dump the results in xlsx/csv format to the specified file If not specified, defaults to "odourcollect.csv". File format is
autodetected based on extension.
List of odour categories and types:
--odourlist Prints the full list of odour categories and types used in OdourCollect.eu web and OdourCollect App
Run this program with --odourlist parameter for a full list of odour categories and types.
More detail on CLI params:
--startdate
and --enddate
: download only data comprised between start and end dates.
When start date is not specified, it uses 2019-01-01 (OdourCollect launched on 2019).
When end date is not specified, it uses current date, so it virtually means no end date.
Format of dates must be in yyyy-mm-dd
format.
--category
and --type
: download only odour observations of certain odours based on a two tier classification system provided by OdourCollect.
Please note that "category" and "type" in PyOdourCollect are known as "type" and "subtype" in OdourCollect respectively.
While the first tier of classification provides only 8 options and provides a general idea for preliminary analysis, the second tier provides almost 90 classificators and is the most precise information that citizens can provide in the odour observation.
--category
accepts numbers from 0 to 9. --type
accepts numbers from 0 to 89. In both cases, 0 means "all" and is the default if not specified, so you can safely ignore these parameters if you prefer to download full data.
The full list of odour categories and odour types can be explored below.
--hedonic
: The hedonic tone provides an observation filter based on the subjective pleasantness perception that the Citizen had when he/she reported.
Internally, OdourCollect allows up to 9 degrees of pleasantness from -4 (most annoyant) to +4 (most pleasant), being 0 neutral.
In order to simplify CLI usage and avoid managing ranges of negative numbers (which could be counterintuitive),
we decided to provide just shorthand keywords: pleasant
(from +1 to +4), unpleasant
(from -1 to -4),
neutral
(observations with tone zero only), and all
(which is the default, so you can safely omit the parameter if you plan to download full data).
--minintensity
and --maxintensity
: Filter observations by a minimum/maximum level of intensity reported by citizen at the time of observation.
Values range from 0 (no odour) to 6 (extremely strong). Defaults are 0 for minimum and 6 for maximum, so you can safely omit these parameters if you plan to download full data.
--gps
: This parameter expects two numbers in the form GPS coordinates for a given point (for example --gps 41.409032 2.222619
).
When specified, the downloaded data will include an extra column called distance
at the end.
This column will provide the distance (with an accuracy of 0.01 Kms.) between the point where the observation was made and the point specified with --gps
.
This is extremely useful when you are analyzing a specific area of interest around a suspicious point from which odour emissions allegedly come.
You can then start your analysis removing or filtering out all the observations that are beyond a given distance (say, 10kms) to perform an analysis of a local area
(which is the most useful and logical use case). In future versions we plan to provide an option to filter out the observations that are past a given distance,
so you have not to filter by yourself at a later stage. For the time being, please manually filter observations that are at 999 Kms from your area of interest 😉.
--output
: Dumps the results obtained from OdourCollect directly to the specified file.
The format is autodetected based on extension. .csv
and .xlsx
files are allowed.
Default value is odourcollect.csv
, which will write a csv file in the working folder.
Please note that we are very conservative here:
Absolute and relative paths are allowed, but ensure that the destination folder exists (by default, current working folder) and the file does not exist already.
Otherwise, CLI tool will complain and will stop with exit code 2.
--odourlist
: Dumps the full list of odour categories and types to use with --category
and --type
.
See following section for more details.
If you run the CLI tool with the parameter --odourlist
, it will provide the following list:
Odour categories (to use in --category parameter:
Code Category
------ ----------------------------------------------------------------------------
1 Waste related odours
2 Waste water related odours
3 Agriculture and livestock related odours
4 Food Industries related odours
5 Industry related odours
6 Urban odours
7 Nice odours
8 Other odours not fitting elsewhere
9 No odour observations (for testing, for reporting the end of an odour, etc.)
Odour types (to use in --type parameter:
Code Category Odour Type
------ ----------------------- -----------------------------------
1 Waste Fresh waste
2 Waste Decomposed waste
3 Waste Leachate
4 Waste Biogas
5 Waste Biofilter
6 Waste Ammonia
7 Waste Amines
8 Waste Other
9 Waste I don't know
10 Waste Water Waste water
11 Waste Water Rotten eggs
12 Waste Water Sludge
13 Waste Water Chlorine
14 Waste Water Other
15 Waste Water I don't know
16 Agriculture / Livestock Dead animal
17 Agriculture / Livestock Cooked meat
18 Agriculture / Livestock Organic fertilizers (manure/slurry)
19 Agriculture / Livestock Animal feed
20 Agriculture / Livestock Cabbage soup
21 Agriculture / Livestock Rotten eggs
22 Agriculture / Livestock Ammonia
23 Agriculture / Livestock Amines
24 Agriculture / Livestock Other
25 Agriculture / Livestock I don't know
26 Food Industries Fat / Oil
27 Food Industries Coffee
28 Food Industries Cocoa
29 Food Industries Milk / Dairy
30 Food Industries Animal food
31 Food Industries Ammonia
32 Food Industries Malt / Hop
33 Food Industries Fish
34 Food Industries Bakeries
35 Food Industries Raw meat
36 Food Industries Ammines
37 Food Industries Cabbage soup
38 Food Industries Rotten eggs
39 Food Industries Bread / Cookies
40 Food Industries Alcohol
41 Food Industries Aroma / Flavour
42 Food Industries Other
43 Food Industries I don't know
44 Industrial Cabbage soup
45 Industrial Oil / Petrochemical
46 Industrial Gas
47 Industrial Asphalt / Rubber
48 Industrial Chemical
49 Industrial Ammonia
50 Industrial Leather
51 Industrial Metal
52 Industrial Plastic
53 Industrial Sulphur
54 Industrial Alcohol
55 Industrial Ketone / Ester / Acetate / Ether
56 Industrial Amines
57 Industrial Glue / Adhesive
58 Urban Urine
59 Urban Traffic
60 Urban Sewage
61 Urban Waste bin
62 Urban Waste truck
63 Urban Sweat
64 Urban <not used>
65 Urban Fresh grass
66 Urban Humidity / Wet soil
67 Urban Flowers
68 Urban Food
69 Urban Chimney (burnt wood)
70 Urban Paint
71 Urban Fuel
72 Urban Other
73 Urban I don't know
74 Nice Flowers
75 Nice Food
76 Nice Bread / Cookies
77 Nice Fruit
78 Nice Fresh grass
79 Nice Forest / Trees / Nature
80 Nice Mint / Rosemary / Lavander
81 Nice Sea
82 Nice Perfume
83 Nice Chimney (burnt wood)
84 Nice Wood
85 Nice New book
86 Nice Other
87 Nice I don't know
88 No Odour No Odour
89 Other NA
Please note that each odour type code actually represents a combination of category + type. Some odour types appear more than once because they are associated to more than one category. This can be confusing sometimes, but it enables the analyst to obtain more nuances based on the category, and have a better understanding of what the perception of the citizen was. That is why we suggest you two ways in which you can filter the odour observation by type, the easiest and the more detailed.
This filter strategy is usually more than enough to perform interesting and relevant analysis. It has the advantadge of not having to deal with numeric codes while being specific enough. If you are getting started with OdourCollect data analysis, we suggest you to stick to the following:
- Just donwload the data with no category/type filters
- open it in your analysis tool (i.e. Microsoft Excel)
- Ignore the column
category
and proceed directly to the columntype
, selecting the description of the odour you are interested in (i.e.:ammonia
).
Technically, this traduces to the following odour codes:
Category | Type | Type code |
---|---|---|
Waste | Ammonia | 6 |
Agriculture / Livestock | Ammonia | 22 |
Food industries | Ammonia | 31 |
Industrial | Ammonia | 49 |
But you will treat them as the same. This way, you will get all the odour observations that have been reported as ammonia, no matter the nuances.
Let's imagine that you are still interested in ammonia, but you want to focus on the odour observations in which the citizen is suspicious about some agricultural/livestock activity nearby. So you are interested in knowing what category did the user select when he/she reported the odour, because this is a way of getting such nuances.
After typing in your command line odurcollect.exe --odourlist
and checking the type codes table,
you realise that the odour type you are interested in is 31
. So you type the following command:
odourcollect.exe --type 31
This command will save odourcollect.csv
in your current directory.
The file will only contain observations of category Agriculture / Livestock
and type Ammonia
.
You suggest you to use the following tools to process the data you download with PyOdourCollect's CLI. They are in no particular order, and it's mostly a matter of choice. Also, there are many more not listed here that can perfectly suit your needs (feel free to let us know, so we can add them here):
- Orange data mining. A free, python based, intutitive, graphic application designed to ease the process of data ingestion, transformation, analysis and visualisation. You can use the mouse and a few clicks without knowing anything about code. It can also plot maps based on gps coordinates. We recommend it!
- Microsoft PowerBI. Microsoft's standard technology for data connectors and data analysis. It features interactive dashboards with interactive graphics that can be extremely useful to illustrate and demonstrate what's happening in a community affected by odour pollution, including geographic maps, but it requieres more practice in comparison to other tools like Orange or Excel.
- Microsoft Excel. The good old option for hobbyists and amateur Citizen Scientists. Its features are more than enough to filter, order, obtain statistics, detect some basic patterns and plot explanatory graphics based on the data.
Want to see your favourite tool here? Let us know.
Using PyOdourCollect is very easy. You only have to do four steps:
- Load module and helpers.
- Prepare a request.
- (Optional) prepare GPS coordinates of a suspicious Point Of Interest, if you have any.
- Send the request to OdourCollect, getting a DataFrame in exchange.
The examples here will assume that you loaded the pyodourcollect module and helpers this way:
import pyodourcollect.ochelpers as ochelpers
import pyodourcollect.occore as occore
import pyodourcollect.ocmodels as ocmodels
Compared to PyOdourCollect's CLI, using PyOdourCollect module directly just requires a bit more knowledge about how OdourCollect.eu API works, and what parameters expect to receive. Things are self-explanatory and have similar counterparts in Command Line Interface tool.
We provide a Pydantic model through pyodourcollect.ocmodels, making the process easier:
requestparams = ocmodels.OCRequest(
date_init=date_init,
date_end=date_end,
minAnnoy=min_annoy,
maxAnnoy=max_annoy,
minIntensity=min_intensity,
maxIntensity=max_intensity,
type=odour_type,
subtype=odour_subtype)
Explanation of the aforementioned parameters (all of them are optional):
date_init
: The equivalent of --startdate
in CLI. Expects a date object.
date_end
: The equivalent of --enddate
in CLI. Expects a date object.
minAnnoy
: Minimum degree of pleasantness/annoyance, from -4 (Extremely annoying) to 4 (Extremely pleasant).
maxAnnoy
: Maximum degree of pleasantness/annoyance, from -4 (Extremely annoying) to 4 (Extremely pleasant).
When using CLI,
minAnnoy
andmaxAnnoy
are not directly specified. Instead, you use the--hedonic
parameter as a shorthand for pleasant (from 1 to 4), unpleasant (from -4 to -1) or neutral (just 0) odours. This was decided to avoid the hassle of handling negative numbers in command line arguments, which can be problematic.
Please note that
minAnnoy
can't be greater than maxAnnoy, and that the ranges always have to be expressed starting from the smaller number: for a range from -4 to 0, -4 is minAnnoy and 0 is maxAnnoy.
minIntensity
: Minimum degree of intensity, from 0 (not perceptible) to 6 (extremely strong).
maxIntensity
: Maximum degree of intensity, from 0 (not perceptible) to 6 (extremely strong).
Again, please note that min values can't be greater that max values. Internal request validators will complain otherwise.
type
: Tier 1 classification of odour observations (known as "Type" in OdourCollect, "Category" in PyOdourCollect).
subtype
: Tier 2 classification of odour observations (known as "Subtype" in OdourCollect, "Type" in PyOdourCollect).
If you have the GPS coords of a suspicious Point Of Interest and you want to add them to the data, you can do it this way:
suspicious_coords = ocmodels.GPScoords(latitude, longitude)
where latitude
and longitude
are the gps coords of the facility, industry, farm, plant, etc. that you are investigating.
The GPScoords helper provides instant validation of valid GPS coordinates.
my_ocdata = occore.get_oc_data(requestparams, suspicious_coords)
where requestparams
and suspicious_coords
are the OCRequest and GPScoords objects built in previous steps.
In case you don't want to add distances from a given point, you can omit the creation of a GPSCoords object,
but you have to explicitly pass None
as second parameter to occore.get_oc_data
.
After that, my_ocdata
will contain a Pandas DataFrame object with all the data ready to use.
So, a typical Python script to gather data from OdourCollect specifying all request parameters would be as follows:
import pyodourcollect.ochelpers as ochelpers
import pyodourcollect.occore as occore
import pyodourcollect.ocmodels as ocmodels
from datetime import datetime
date_init = datetime.strptime('31-01-2020', '%Y-%m-%d') # just random example
date_end = datetime.strptime('30-02-2020', '%Y-%m-%d') # just random example
requestparams = ocmodels.OCRequest(
date_init=date_init,
date_end=date_end,
minAnnoy=-4,
maxAnnoy=4,
minIntensity=0,
maxIntensity=6,
type=0,
subtype=0)
suspicious_coords = ocmodels.GPScoords(41.409032, 2.222619) # Example. A waste water processing plant in Barcelona
my_ocdata = occore.get_oc_data(requestparams, suspicious_coords)
BONUS TRACK: One liner to get all data from OdourCollect quickly and easily:
my_ocdata = occore.get_oc_data(ocmodels.OCRequest(), None) # Requests with no filter parameters cause OdourCollect to yield all data.