- The Pragmatic Programmer (Book)
- Clean Code (Book)
- Architecture Playbook (Online guide)
- A Whirlwind Tour of Python (Book)
- Python Data Science Handbook
- Python Tricks (Book)
- Learning Python (Book)
- Effective Python (Book)
- R for Data Science (Book)
- Advanced R (Book)
- R Markdown: The Definitive Guide (Book)
- bookdown: Authoring Books and Technical Documents with R Markdown (Book)
- Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving (Book)
- Automated Data Collection with R (Book)
- Introduction to Data Science (Book)
- Spark: The Definitive Guide: Big Data Processing Made Simple (Book)
- Learning Spark: Lightning-Fast Big Data Analysis (Book)
- Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling (Book)
- The Missing Semester of Your CS Education (Online course)
- Learning the bash Shell (Book)
- The Art of the Command Line (GitHub resources)
- explainshell.com (Online help)
- Docker tips & tricks or just useful commands (Online article)
- Rocker: R configurations for Docker (GitHub resources)
- Docker and Python: making them play nicely and securely for Data Science and ML (PyCon Talk)
- An Introduction to the Basic Principles of Functional Programming (Online article)
- R for Data Science, Ch. 21 (Book)
- Advanced R, Ch. 9 (Book)
- Jenny Bryan's purrr tutorials (Online tutorial)
- Foundations of Functional Programming with purrr (DataCamp)
- Intermediate Functional Programming with purrr (DataCamp)
- Excuse me, do you have a moment to talk about version control? (Paper)
- Happy Git and GitHub for the useR (Book)
- Learn Git (Online tutorial)
- Introduction to Git In 16 Minutes (Online tutorial)
- Git Commit Message Style Guide (Online guide)
- The Art of Readable Code (Book)
- The Tidyverse Style Guide (Online book)
- PEP 8 -- Style Guide for Python Code (Online guide)
- Guidelines for code reviews (README)
- Code Review Best Practices (Blog post)
- Testing R Code (Book)
- Python Testing with pytest (Book)
- Multiply your Testing Effectiveness with Parameterized Testing (PyCon Talk)
- Test-Driven Development (Book)
- Introduction to Statistical Learning (Book)
- Applied Predictive Modeling (Book)
- Elements of Statistical Learning (Book)
- Computer Age of Statistical Inference (Book)
- Statistical Modeling: The Two Cultures (Paper)
- Deep Learning (Book)
- Hands-On Machine Learning with Scikit-Learn & TensorFlow (Book | GitHub)
- Hands-On Machine Learning with R (Book)
- Google's Machine Learning Crash Course (MOOC)
- Rules of Machine Learning: Best Practices for ML Engineering (Article)
- How to Write Design Docs for Machine Learning Systems (Article)
- ISLR: Ch. 10.3 Clustering Methods (Book chapter)
- A K-Means Clustering Algorithm (Paper)
- Generalized Low Rank Models (Paper)
- Deep Learning Ch. 15 Autoencoders (Book chapter)
- Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders (Book chapter | GitHub resource)
- Sparse autoencoder (Andrew Ng CS294A lecture notes)
- Lessons from Running Thoursands of A/B Tests (Online presentation with many references)
- Online Controlled Experiments at Large Scale (Paper)
- Peaking at A/B Tests (Paper)
- Multi-armed Bandit (Online tutorial)
- A Modern Bayesian Look at the Multi-armed Bandit (Paper behind above online tutorial)
- Predicting Search Satisfaction Metrics with Interleaved Comparisons (Paper)
- Evaluating Retrieval Performance using Clickthrough Data (Paper)
- Multivariate Adaptive Regression Splines (Friedman's original paper)
- APM: Ch. 7.2 Multivariate Adaptive Regression Splines (Book chapter)
- ESL: Ch. 9.4 Multivariate Adaptive Regression Splines (Book chapter)
- Notes on the earth package (Paper)
- k-Nearest neighbour classifiers (Paper)
- APM: Ch. 7.4 & 13.5 K-Nearest Neighbors (Book chapter)
- ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers (Book chapter)
- An Introduction to Recursive Partitioning Using the RPART Routines (Paper)
- Random Forests - Leo Breiman's original research paper (Paper)
- How to explain gradient boosting (Online tutorial)
- Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014 (YouTube)
- Trevor Hastie - Data Science of GBM (2013) (slides)
- Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015 (YouTube)
- Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014 (YouTube)
- Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial (Paper)
- How to Train XGBoost With Spark (Blog)
- Training XGBoost4J-Spark with PySpark (Tutorial notebook)
- Use XGBoost on Databricks (Tutorial notebooks)
- Deep Learning with R (Book)
- Deep Learning with Python (Book)
- Deep Learning Specialization (MOOC)
- keras.rstudio.com (Online articles & tutorials)
- blogs.rstudio.com/tensorflow (Online articles & tutorials)
- Illustrated Guide to Recurrent Neural Networks (Blog)
- Illustrated Guide on Vanishing Gradients (Blog)
- Illustrated Guide to LSTMs and GRUs (Blog)
- Understanding LSTMs (Blog)
- Rohan & Lenny: Recurrent Neural Networks & LSTMs (Blog)
- The Unreasonable Effectiveness of Recurrent Neural Networks (Blog)
- Revisiting Small Batch Training for Deep Neural Networks (Paper)
- On Loss Functions for Deep Neural Networks in Classification (Paper)
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
- Efficient BackProp (Paper)
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (Paper)
- Cyclical Learning Rates for Training Neural Networks (Paper)
- A Disciplined Approach to Neural Network Hyperparameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay (Paper)
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Paper)
- Ensemble Methods in Machine Learning (Paper)
- Stacked Regressions (Paper)
- Super Learner (Paper)
- Text Mining with R (Book)
- Probabilistic Topic Models (Paper)
- The Illustrated Word2vec (Online tutorial)
- Sebastian Ruder's series on Word Embeddings (Online articles & tutorials)
- Neural Models for Information Retrieval (Paper)
- Why do we use word embeddings in NLP? (Blog)
- Collaborative Filters for Recommendation Systems (Fast.ai Deep Learning Lesson, starts at 1:25:00)
- How to Measure and Mitigate Position Bias (Blog)
- Counterfactual Evaluation for Recommendation Systems (Blog)
- Deep Learning Tuning Playbook (Github repo README)
- Hyperparameters and Tuning Strategies for Random Forest (Paper)
- Tunability: Importance of Hyperparameters of Machine Learning Algorithms (Paper)
- Machine Learning Benchmarks and Random Forest Regression (Paper)
- Random Search for Hyperparameter Optimization (Paper)
- Algorithms for Hyper-Parameter Optimization (Paper)
- Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures (Paper)
- Feature Engineering for Machine Learning (Book)
- Feature Engineering and Selection: A Practical Approach for Predictive Models (Book)
- Feature Stores - A Hierarchy of Needs (Article)
- Feature Selection with the Boruta Package (Paper)
- APM: Ch. 19 An Introduction to Feature Selection (Book chapter)
- Scott Lundberg's presentation on SHAP
- H2O.ai Machine Learning Interpretability Resources (GitHub resources)
- Patrick Hall's Awesome Machine Learning Interpretability Resources (GitHub resources)
- Interpretable Machine Learning (Book)
- Visualizing the Feature Importance for Black Box Models (Paper)
- A Simple and Effective Model-Based Variable Importance Measure (Paper)
- Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (Paper)
- pdp: An R Package for Constructing Partial Dependence Plots (Paper)
- "Why Should I Trust You?": Explaining the Predictions of Any Classifier (Paper)
- A Unified Approach to Interpreting Model Predictions (Paper)
- Consistent Individualized Feature Attribution for Tree Ensembles (Paper)
- On the Art and Science of Machine Learning Explanations (Paper)
- Explanation in artificial intelligence: Insights from the social sciences (Paper)
- Please Stop Permuting Features: An Explanation and Alternatives (Paper)
- A Stratification Approach to Partial Dependence for Codependent Variables (Paper)
- Explaining Machine Learning Classifiers through Diverse Counterfactual Examples (Paper)
- A Review of Automatic Selection Methods for Machine Learning Algorithms and Hyperparameter Values (Paper)
- Learning Multiple Defaults for Machine Learning Algorithms (Paper)
- The Design and Analysis of Benchmark Experiments (Paper)
- Szilard Pafka's ML Benchmarking Research (GitHub resources)
- Data-driven advice for applying machine learning to bioinformatics problems (Paper)
- Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning (Paper)
- Futility Analysis in the Cross-Validation of Machine Learning Models (Paper)
- Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out, and Bootstrap (Paper)
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
- Hidden Technical Debt in Machine Learning Systems (Paper)
- Deep Learning in Production (Github resources)
- Building Riviera: A Declarative Real-Time Feature Engineering Framework (Blog - DoorDash)
- Software Engineering for Machine Learning: A Case Study (Paper - Microsoft)
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Paper - Google)
- Designing Machine Learning Systems (Book)
- Machine Learning Operations (MLOps): Overview, Definition, and Architecture (Paper)
- Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist (Blog - Stitch Fix)
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department (Blog - Stitch Fix)
- The Engineering/Manager Pendulum (Blog)
- Lessons learned managing the GitLab Data team (Blog - GitLab)
- The Manager's Path: A Guide for Tech Leaders Navigating Growth and Change (Book)
- Who is fit to lead data science? (Blog - KDnuggets)
- Platform Revolution (Book)
- No Rules Rules: Netflix and the Culture of Reinvention (Book)
- Living by the Code (Book)
- The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers (Paper)
- The Cost of Cloud, a Trillion Dollar Paradox (Blog - Andreessen Horowitz)
- From Cloud Computing to Sky Computing (Paper)
- The Influential Product Manager: How to Lead and Launch Successful Technology Products (Book)
- Mastering Product Management: A Step-by-Step Guide (Book)
- Preparing for performance reviews ahead of time (Blog)
- Get your work recognized: write a brag document (Blog + template)
- Don't do invisible work (Presentation)
- Work log template for Software Engineers (Template)
- Sending weekly 5-15 updates (Blog)