Skip to content

uwescience/dssg2024_geospatial_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSSG 2024 Geospatial Analysis Tutorial

A brief tour of geospatial analysis, mostly focused on open source software.

Organization

DSSG 2024 has groups working in Python and R with a variety of geospatial data! Fortunately, there are a lot of great public tutorials, textbooks, and resources out there to introduce data scientists to geospatial data analysis.

This repository is intended as a snapshot to point groups toward a selection of resources that might be useful for projects over the summer. We're going to assume no prior experience working with geospatial data, but some experience with Python or R, and GitHub.

The intent is to provide a guided 3-hour tour with a few coding examples and experiments along the way.

Interactive Computing

You can use Pangeo Docker Images + GitHub Codespaces to run Python code examples during this tutorial!

https://codespaces.new/uwescience/dssg2024_geospatial_tutorial?quickstart=1

Be sure to stop the Codespace when you're not using it from this page (https://github.com/codespaces). A "codespace" is a server running in Microsoft Azure, and you have a quota of free minutes per month.

Start Here

"Geospatial" data is simply data that can be located in space (usually the surface of the Earth) and time. You'll often hear people analyzing geospatial data talk about "Geographic Information System" (GIS), which is essentially software to manage and process a number of independent geospatial datasets.

If you are new to geospatial analysis and like to start with history and theory I recommend this online course material from Penn State University (e.g. maps, GIS)

The Carpentries provide instructor-lead workshops with well-designed public lessons in different programming languages

R https://datacarpentry.org/geospatial-workshop/
Python https://carpentries-incubator.github.io/geospatial-python/

Exercise

Take 5 minutes to explore the above resources or others. Can you find information on "vector" and "raster" geospatial data?

Concept 1: Understand the landscape of of open source software tools (GDAL, QGIS, PostGIS, Python libraries, R libraries)

The Geospatial Data Abstraction Library("GDAL") is a foundational software library released in 2000 written in C that is a dependency of most major GIS software today! Read more and gain experience with GDAL in this excellent series of blog posts by Robert Simmon: https://medium.com/planet-stories/a-gentle-introduction-to-gdal-part-1-a3253eb96082

QGIS is an excellent free and open-source Desktop GIS application to work with (see also https://uwgda-jupyterbook.readthedocs.io/en/latest/resources/qgis.html)

PostGIS is a spatial extension for the open source PostgreSQL database. As you amass a large amount of data it is convenient to use a database to have efficient searches, enforce format schemas, and optionally permissions and handling of simultaneous users.

Python is a popular programming language due to it's versatility and huge ecosystem of libraries. It can be overwhelming trying to find the right one! Here is a slightly more curated list. In general there are a few foundational libraries, and many more that depend on these. You'll often here references to a software 'stack' and indeed, it is layer upon layer of source code all the way down!

Vector Geopandas
2D Raster Rasterio
nD Arrays Xarray

R is a programming language that is very popular among statisticians. Similar to Python, there is a large ecosystem of packages with various specializations. Core libraries can be found here https://github.com/r-spatial.

Are there other software libraries you recommend?

Open Data

Unfortunately, there is no single 'Google Search' for geospatial data (...yet). So each data provider often has idiosyncratic search utilities. SpatialTemporal Asset Catalog (STAC) is a recent effort to standardize metadata across data providers.

Civilian Satellite Remote Sensing

NASA EarthData https://www.earthdata.nasa.gov
ESA Copernicus Open Data https://dataspace.copernicus.eu/explore-data

Climate Models

ECMWF https://www.ecmwf.int/en/forecasts/datasets
NCAR https://www.cesm.ucar.edu/models

Census Data

United States https://data.census.gov

What other data providers do you regularly use?

Cloud computing and big data

Geospatial data is experiencing exponential growth in both size and complexity. As a result, traditional data access methods, such as file downloads, have become increasingly impractical for achieving scientific objectives (https://guide.cloudnativegeo.org)

Microsoft Planetary Computer https://planetarycomputer.microsoft.com/catalog
Google Cloud / Earth Engine https://cloud.google.com/datasets?hl=en
AWS Open Data https://aws.amazon.com/earth/

Major Cloud Providers host mirrored archives of public data from NASA, ESA, and other agencies, which have been converted to "cloud-optimized" formats. Google Earth Engine is a unique Cloud-computing platform focused on geospatial imagery analysis that is free for non-commercial use.

Skill 1 / Exercise: Find and analyze imagery and GIS datasets with open-source software

Take 20 minutes to find public geospatial data from one of the above and use QGIS or another tool to view it on a map!

Concept 2. Combining raster and vector data (reprojection, sampling, subsetting, interpolation)

Skill 2 / Exercise: Reproject data to have a common coordinate reference system for joint analysis

60 minutes. Go over intro-geopandas.ipynb in this repository focused on vector data and the Microsoft Planetary Computer quickstart https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/

Concept 3. Geostatistics

"Geostats" is the collection of methods for analyzing and predicting values and uncertainties associated with spatiotemporal phenomena. In other words, given measurements at known locations, how well can we predict unknown values at different locations? There are many methods! But a key thing to remember is that geospatial observations close in time and space tend to be more closely related (correlated) compared to measurements that are far apart. Geostatistical methods come up with sophisticated ways to deal with this. Below are two online comprehensive textbooks that have practical code examples.

Python

Spatial Data Science for Social Geography https://github.com/martinfleis/sds

R

Geocomputation with R, a book on geographic data analysis, visualization and modeling https://r.geocompx.org

Skill 3 / Exercise Be able to visualize and share results of your analysis efficiently among your team

The easiest way to share a small amount of data is to put it on GitHub. GitHub will render maps for you!

Use https://geojson.io to create a GeoJSON file, put it in a GitHub repository, and share a link with your colleague!

Additional Resources

Explore the Cutting Edge

Formal coursework

Programming

About

A brief tour of geospatial analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published