Prof. Michael J. Pyrcz, @GeostatsGuy, Resources
Howdy Folks, I'm Michael Pyrcz, an Associate Professor at The University of Texas. I teach and conduct research on data analytics, geostatistics and machine learning. I'm appointed in the Hildebrand Department of Petroleum and Geosystem Engineering, the Jackson School of Geosciences and the Bureau of Economic Geology. I'm also a principal investigator in the College of Natural Sciences Energy Analytics Freshmen Research Initiative and Inventors' Program and a core faculty in the Machine Learning Laboratory in Computer Sciences, all at The University of Texas at Austin.
I feel that the role of professor is a role of service, so I post all my lectures and supporting content online resulting in evergreen content that outlasts the semester and reaches beyond campus. I hope this content supports:
- my students for ongoing learning content long after they finish my courses
- working professionals facing the digital transformation and interested to learn new skills
- potential students by breaking down barriers and making our university a welcoming place for all interested to learn
Here's an inventory of my online resources that I have made to help people learn about spatial data analytics, geostatistics and machine learning. I have produced these resources to support my students and I thought they would be useful to my students after completion of the class (an evergreen resource), to other students and working professionals interested in this topic.
Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions
With over 17 years of experience in spatial, subsurface data analytics consulting, research and development, and leadership, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and (geo)scientists' impact in spatial, subsurface resource development.
For more about Michael, my research group (15 PhDs), my consortium (DiReCT), my publications, my background, my education startup etc. check out these links:
Twitter | GitHub | Website | GoogleScholar | Book | YouTube | LinkedIn | DiReCT Website | DiReCT GitHub | daytum
Want to learn more about my story, my publications and other contributions to open source, check this out:
-
My story of how I got started in engineering and ended up as a professor at The University of Texas at Austin My Story
-
My research, approach to research and views on building an inclusive and diverse team My Research
-
Nothing is possible without awesome graduate students My Students
-
I've written a bit, here's the books My Books
-
My peer-reviewed publications My Papers
-
My other contributions My Other Contributions
-
I wrote an open source Python package for spatial data analytics and geostatistics. Much of it is a translation of GSLIB (Deutsch and Journel, 1998) from the original Fortran to Python for 2D geostatistical methods. I did this to support my students in my Spatial Data Analytics and Geostatistics courses. Check it out and consider contributing and become a coauthor at GeostatsPy on PyPi Repository and GitHub.
- NOTE, since GeostatsPy relies on the Numba package for code acceleration, and Numba is not updated to Python >= 3.9, please use Python < 3.9 with GeostatsPy.
-
I do quite a bit on social media, here's why I do it, My Social Media Efforts.
-
Check out my TEDx talk on 'A Professor's Secret Weapon' TED Talk
-
Check out my Twitter feed for resources, ideas and possitivity most days, where I'm the GeostatsGuy Twitter.
-
I post a lot of code, demonstration workflows and course material to support anyone that wants to learn My GitHub
-
I partnered with Prof. John Foster (UT Austin) and Bazean, a technology-enabled energy investment firm, to start the energy-focussed data science education company, daytum. We are currently offering short courses in Energy Data Science.
Online Resources on Spatial Data Analytics, Geostatistics and Machine Learning
I record all my university lectures and post them on YouTube. You are welcome to join my classes!
-
Open Source Spatial Data Analytics in Python with GeostatsPy
-
Tutorial: Open Source Spatial Data Analytics in Python with GeostatsPy
-
Geostatistical Workflows for Unconventional Reservoirs at BEG
-
What Does a Geoscientist Need to Know About Geostatistics? And Why It Would Be Helpful?
-
Michael's Unsolicited Advice and Ideas for a Successful and Happy Career in Our Industry
-
My interview on AAPG's Digging Deeper podcast with the awesome host Vern Stefanic.
I wrote a Python Package called GeostatsPy for spatial data analytics and geostatistics. Here's a set of demonstration workflows in Python Jupyter Notebook for many of the fundamental workflow steps from data preparation, statistical inference to spatial prediction with uncertainty. They go along with my recorded lectures from my courses on my YouTube channels:
Here's the workflows:
- GeostatsPy: Reimplementation of GSLIB in Python
- Data Distributions with GeostatsPy
- Feature Ranking with GeostatsPy
- Volume Variance Relations with GeostatsPy
- Confidence Intervals and Hypothesis Testing with GeostatsPy
- Monte Carlo Simulation with GeostatsPy
- Bootstrap with GeostatsPy
- Data Distributions
- Data Distribution Transformations with GeostatsPy
- Declustering with GeostatsPy
- Ensemble Declustering with GeostatsPy
- Inverse Distance Interpolation with GeostatsPy
- Indicator Kriging with GeostatsPy
- Kriging with GeostatsPy
- Multivariate Analysis with GeostatsPy
- Overfitting Models with GeostatsPy
- Plotting Spatial Data with GeostatsPy
- Directional Spatial Continuity with GeostatsPy
- Spatial Updating with GeostatsPy
- Spatial Trend Modeling with GeostatsPy
- Multivariate Feature Ranking with GeostatsPy
- Variogram Calculation with GeostatsPy
- Variogram Modeling with GeostatsPy
- Spatial Bootstrap with GeostatsPy
- Spatial Simulation with GeostatsPy
- Spatial Indicator Simuluation with GeostatsPy
- Spatial Simulation Post-processing with GeostatsPy
I think interactive workflows are excellent tools to support education. For data analytics and machine learning, turning a dial and watching a system or machine change is a great method to gain intuition and experience. I started to put together interactive workflows with ipywidgets and matplotlib. Check them out here:
- General Bootstrap
- Parametric Distributions
- Monte Carlo Simulation
- Bootstrap Colored Balls in a Cowboy Hat
- Norms
- Optimization
- Overfit
- DYI Central Limit Theorem
- Confidence Interval by Bootstrap and Analytical
- Sivia's Bayesian Coin
- Spurious Correlation
- Correlation Coefficient
- LASSO Regression
- Principal Components Analysis
- Ridge Regression
- Simple Kriging
- String Effect
- Stochastic Simulation
- Uncertainty with Spatial Aggregation
- Kriging String Effect
- Uncertainty Model Checking
- Variogram Calculation
- Variogram Modeling
- Combined Variogram Calculation and Modeling
- Spectral Clustering
- Artificial Neural Networks
- Checking Uncertainty Models
- Shapley Values
- Probability Theory – my undergraduate lecture
- Statistics – undergraduate lecture
- Marginal, Joint & Conditional Probability – slides
Parametric Distributions are fundamental to statistics and data analytics inferential and predictive workflows. Sometimes they are required by theory and often they result from nature. Many students struggle with them so I made simple demonstrations in Microsoft Excel that cover how to make them from scratch and how to work with them:
- How to make them in Excel
- Poisson distribution in Excel
- Gaussian transform in Excel and Python
- Log normal distribution in Excel
- Interactive parametric distributions in Python
Hypothesis Testing is all about recognizing the difference that makes a difference. These tests protect us from the belief in small numbers and are bias to see patterns in random phenomenon.
- Difference in means in Excel and in Python
- Difference in variances in Excel and in Python
- Difference in distributions in Excel
- Interactive hypothesis testing in Python
Bayesian Apporaches are powerful. They integrate prior belief with new observations, provide explicit uncertainty models and more intuitive credible intervals for uncertainty in model parameters. Here's some accessible demonstrations to get you started thinking like a Bayesian statician.
- The Coin Problem from Sivia (1996) in Excel
- Bayesian updating with Gaussian in Excel
- Probability given a positive test in Excel
- Sivia's Bayesian Coin in Interactive Python
- Bayesian Regression in Python
- Naive Bayes Regression and Classification in Python
- Bootstrap in Excel, in Python and in R
- Spatial Bootstrap in Python
- Linear regression in Excel and in R
- Loss functions in Excel
- Multivariate Analysis
Our subsurface systems are heterogeneous and heterogeneity matters in many subsurface prediction problems. Here are some accessible demonstrations to help you get started quantifying heterogeneity.
- Making an example well in Excel
- Lorenz coefficient in Excel
- Hurst coefficient in R
- Ripley Cross K in R
- Ripley K-function in Python
- Lozenz coefficient in Python
- Lorenz coefficient functions in Python
I have an new Subsurface Machine Learning Course that builds from fundamental probability to artificial neural networks. The recorded lectures are available here:
You are welcome to follow along! The demonstration workflows from the lectures are here:
- Feature Imputation in Python
- Feature Ranking in Python
- Feature Transformations in Python
- Feature Uncertainty in Python
- Dimensional Reduction in Python and in R
- Clustering in Python
- Principal Components Analysis in Python
- Multidimensional Scaling and Random Projection in Python
- Linear Regression in Python
- Ridge Regression in Python
- LASSO Regression in Python
- Isotonic Regression in Python
- Bayesian Regression in Python
- Polynomial Regression in Python
- Naive Bayes Regression and Classification in Python
- Time Series Analysis
- k Nearest Neighbour
- Decision tree in PythonPython Advanced and in R
- Gradient Boosting in Python and Advanced Gradient Boosting in Python
- Support Vector Machines in Python
- Neural Networks in Python
- Convolution Operators in Python
- Convolutional Neural Networks in Python
- Convolutional Neural Networks Classifier in Python
- Generative Adversarial Networks in Python
- Conditional Generative Adversarial Network in Python
- Course Conclusion
- scikit learn Overview
- GeostatsPy: Reimplementation of GSLIB in Python
- Introduction to Data Analytics, Geostatistics and Machine Learning Undergraduate Lectures (Lec00-Lec21)
- What Does a Geoscientist Need to Know About Geostatistics? And Why It Would Be Helpful? and PPT
- Exercises, hands-on and demonstrations PPT Inventory
- Functions that reimplement or call GSLIB exes in Python
- Demo of the functions in Python
- Declustering in Python and with PyGSLIB Package
- Declustering and Debiasing in Excel
- Variogram calculation in Excel and in R
- Full variogram Calculation and Modeling in Excel and in PyGSLIB Package
- Facies criteria in PPT
- Value of quantification in PPT
- Stationarity in PPT
- Uncertainty in PPT
- Suggested books in PPT
- Simple kriging in Excel and in R
- Uncertainty Away from Data in Excel
- Convolution methods in Python
- LU Simulation in Pyton
- Sequential Gaussian simulation in Excel and in R
- Truncated Gaussian simulation in Excel
- Spatial uncertainty in Excel
- Volume-variance relations in Excel
- Working with realizations in R
- Lecture on value in industry in PPT
I hope these resources are useful.
I hope that this is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.
-
Want to invite me to visit your company for training, mentoring, project review, workflow design and consulting, I'd be happy to drop by and work with you!
-
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
-
I can be reached at [email protected].
I'm always happy to discuss,
Michael
Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin