Skip to content

Latest commit

 

History

History
95 lines (54 loc) · 10.5 KB

README.md

File metadata and controls

95 lines (54 loc) · 10.5 KB

GEOG696F Python for Spatial and Temporal Data Science

Overview

This is a graduate statistics and programming course taught as GEOG696F, the 'Advanced Methods and Techniques' seminar in the School of Geography, Development, and Environment) at the University of Arizona. The class was first taught in Fall 2024. The full syllabus is available here.

This course is designed as a graduate level class in a workshop format to give students a theoretical framework, practical experience, expert knowledge, and statistical tools for analyzing datasets that have a temporal and/or spatial dimensions. It is fundamentally about building tools and practical understanding so that students can knowledgeably apply these techniques in their own research and their own data. Topics include: basic and intermediate tools and procedures in Python, correlation, regression, Monte Carlo methods, time series analysis, spectral analysis, reduced space empirical orthogonal function/principal components analysis, interpolation, Gaussian processes, and Bayesian statistics. The course encompasses instruction and training in Python and in the use and manipulation of large multi-dimensional datasets.

The major outcome for the class for each student will be a new and independent analysis of a substantial dataset using one or preferably more techniques from the course materials, a formal manuscript describing the motivation, methods, and results of this analysis, and a professional oral presentation. Students are encouraged to bring with them or seek out data relevant to their research to use for their final project. Ideally, students' final projects will provide the material for a thesis chapter and/or peer-reviewed article.

General Schedule

Augst 26 to September 4 - Python fundamentals

September 9 to September 11 - Correlation

September 16 to September 18 - Monte Carlo and stochastic simulation

September 23 to September 25 - Regression

September 30 to October 2 - Time series analysis

October 7 to October 9 - Spectral analysis

October 14 to October 16 - Reduced space methods

October 21 to October 23 - Interpolation and Gaussian Processes

October 28 to October 30 - Introduction to Bayesian statistics

November 4 to November 29 - Student project work

December 2 to December 4 - Final presentations

December 13 - Final project and paper due

Why Python?

My own programming career started in FORTRAN and moved to MATLAB, a language I've now spent almost 25 years using effectively and (mostly) without complaint. But with an increasing number of jobs for graduate students outside academia and with the rise of Python as the de facto language of data science, I've start to teach my data analysis classes in Python. This had involved some growing pains (for me!), but in the end I hope that the chance to learn statistical techniques in a language so widely used across so many fields will be worth the extra trouble for the students who take the class. Particularly for earth and environmental scientists relatively new to Python, Martin Trauth's book Python Recipes for Earth Sciences provides a useful and broad introduction solidly grounded in various types of analyses.

Recommended Software Installation

Anaconda is a package management software that downloads a number of packages for data analysis and exploration – including base Python – but is quite large. Since not all packages are always required, a 'lite' version of Anaconda is also available called Miniconda. Miniconda gives you base Python and allows for all the Anaconda management functions, but has a much smaller initial download size and installation time because it installs few packages (which means you'll need to install some packages not included in the installation). Once installed, both Anaconda or Miniconda will be referred to (and called from the shell, terminal, or command line) simply as conda. A cheatsheet of conda commands can be found here.

I personally use Anaconda, but instructions for installing via either are available in the following links:

This page from DataCamp contains useful and straightforward information on getting Python installed on both Windows and Mac.

Here is an installation narrative we developed for a coding bootcamp - it includes step-by-step instructions for installation using Miniconda as well as how to setup a virtual conda environment for this class, if you so choose. There is a 17 minute YouTube video to accompanies this, which shows you step-by-step instructions as well (note you won't need to do everything in the video).

This Youtube video from Visual Studio Code (the integrated coding environment we'll use in this class) can get you up and running pretty quickly. They show installation in Windows, so macOs will be slightly different. If necessary, we'll also go this live in class the week of August 26th.

Here are the basic steps from the video:

Environment

A environment.yml file is provided in this repository which allows users to create a conda virtual environment that included the libraries, packages, and required dependencies for Fall 2024's version of the course.

Integrated Development Environments (IDE)

Unlike MATLAB and R (via RStudio), there is no single software package used for Python development. Indeed, you could develop Python with just text files and the command line. In this class I will used VS Code, a free and multi-languages IDE. You can also develop your code entirely in Jupyter notebooks in your browser if you wish. Another popular IDE is PyCharm, which is excellent but not free. Finally, there is now Positron, from the makers of RStudio - it is still in development, but looks promising.

It isn't important which IDE you choose.

Running notebooks in your browser

You can find step-by-step instructions for running Jupyter notebooks locally in your browser here

Getting Started with Python

If you'd like some additional materials for getting started with Python, here are some possibilities:

Github

Although not strictly required for this course, I encourage you to use the capacity of Git and Github to streamline your access to and use of the notebooks created for this class, as well as advance your own development of reproducible and readily shareable code. Here are some good places to start:

Interested in more?

I ofter GEOG 696C 'Spatiotemporal Data Analysis' next in Fall 2025, which goes deeply into linear algebra for statistics, empirical orthogonal function (EOF) analysis, and singular spectrum analysis. It is also taught in Python.

Contact

Did you find this course material useful? Want to share ideas? Find some bugs? Feel free to contact me at [email protected]