tweaked the text and added all cross-references

hsf-training · Nov 28, 2024 · a2a856d · a2a856d
1 parent 3afa08a
commit a2a856d
Show file tree

Hide file tree

Showing 22 changed files with 285 additions and 176 deletions.
diff --git a/deep-learning-intro-for-hep/00-intro.md b/deep-learning-intro-for-hep/00-intro.md
@@ -13,9 +13,9 @@ The course materials include some inline problems, intended for active learning
 
 ## Software for the course
 
-This course uses [Scikit-Learn](https://scikit-learn.org/) and [PyTorch](https://pytorch.org/) for examples and problem sets. [TensorFlow](https://www.tensorflow.org/) is also a popular machine learning library, but its functionality is mostly the same as PyTorch, and I didn't want to hide the concepts behind incidental differences in software interfaces. (I _did_ include Scikit-Learn because its interface is much simpler than PyTorch. When I want to emphasize issues that surround fitting in general, I'll use Scikit-Learn because the fit itself is just two lines of code, and when I want to emphasize the details of the machine learning model, I'll use PyTorch, which expands the fit into tens of lines of code and allows for more control of this part.)
+This course uses [Scikit-Learn](https://scikit-learn.org/) and [PyTorch](https://pytorch.org/) for examples and problem sets. [TensorFlow](https://www.tensorflow.org/) is also a popular machine learning library, but its functionality is mostly the same as PyTorch, and I didn't want to hide the concepts behind incidental differences in software interfaces. (I _did_ include Scikit-Learn because its interface is much simpler than PyTorch. When I want to emphasize issues that surround fitting in general, I'll use Scikit-Learn because the fit itself is just two lines of code, and when I want to emphasize the details of the machine learning model, I'll use PyTorch, which expands the fit into tens of lines of code and allows for more control.)
 
-I didn't take the choice of PyTorch over TensorFlow lightly (since I'm a newcomer to both). I verified that PyTorch is about as popular as TensorFlow among CMS physicists using the plot below (derived using the methodology in [this GitHub repo](https://github.com/jpivarski-talks/2023-05-09-chep23-analysis-of-physicists) and [this talk](https://indico.jlab.org/event/459/contributions/11547/)). Other choices, such as [JAX](https://jax.readthedocs.io/), would be a mistake because a reader of this tutorial would not be prepared to collaborate with machine learning as it is currently practiced in particle physics.
+I didn't take the choice of PyTorch over TensorFlow lightly. I verified that PyTorch is about as popular as TensorFlow among CMS physicists using the plot below (derived using the methodology in [this GitHub repo](https://github.com/jpivarski-talks/2023-05-09-chep23-analysis-of-physicists) and [this talk](https://indico.jlab.org/event/459/contributions/11547/)). Other choices, such as [JAX](https://jax.readthedocs.io/), would be a mistake because a reader of this tutorial would not be prepared to collaborate with machine learning as it is currently practiced in particle physics.
 
 ![](img/github-ml-package-cmsswseed.svg){. width="100%"}
 
@@ -37,7 +37,7 @@ and navigate to the `notebooks` directory in the left side-bar. The pages are fo
 
 ## To run everything on your own computer
 
-Make sure that you have the following packages installed with [conda](https://scikit-hep.org/user/installing-conda), pip, uv, pixi, etc.
+Make sure that you have the following packages installed with [conda](https://scikit-hep.org/user/installing-conda) (or pip, uv, pixi, etc.):
 
 ```{include} ../environment.yml
 :literal: true

diff --git a/deep-learning-intro-for-hep/01-overview.md b/deep-learning-intro-for-hep/01-overview.md
@@ -37,11 +37,9 @@ All of these terms describe the following general procedure:
 2. vary those parameters ("train the model") until the algorithm returns expected results on a labeled dataset ("supervised learning") or until it finds patterns according to some desired metric ("unsupervised learning");
 3. either apply the trained model to new data, to describe the new data in the same terms as the training dataset ("predictive"), or use the model to generate new data that is plausibly similar to the training dataset ("generative"; AI and ML only).
 
-Apart from the word "huge," this procedure also describes curve-fitting, a ubiquitous analysis technique that most experimental physicists use on a daily basis. Consider a dataset with two observables (called "features" in ML), $x$ and $y$, and suppose that they have an approximate, but not exact, linear relationship. There is [an exact algorithm](https://en.wikipedia.org/wiki/Linear_regression#Formulation) to compute the best fit of $y$ as a function of $x$, and this linear fit is a model with two parameters: the slope and intercept of the line. If $x$ and $y$ have a non-linear relationship expressed by $N$ parameters, a non-deterministic optimizer like [MINUIT](https://en.wikipedia.org/wiki/MINUIT) can be used to search for the best fit.
+Apart from the word "huge," this procedure also describes curve-fitting, a ubiquitous analysis technique that most experimental physicists use on a daily basis. ML fits differ from curve-fitting in the number of parameters used and their interpretation—or rather, their lack of interpretation. In curve fitting, the values of the parameters and their uncertainties are regarded as the final product, often quoted in a table as the result of the data analysis. In ML, the parameters are too numerous to present this way and wouldn't be useful if they were, since the calculation of predicted values from these parameters is complex. Instead, the ML model is used as a machine, trained on "features" $x$ and "targets" $y$, to predict new $y$ for new $x$ values (inference) or to randomly generate new $x$, $y$ pairs with the same distribution as the training set (generation). In fact, most ML models don't even have a unique minimum in parameter space—different combinations of parameters would result in the same predictions.
 
-ML fits differ from curve-fitting in the number of parameters used and their interpretation—or rather, their lack of interpretation. In curve fitting, the values of the parameters and their uncertainties are regarded as the final product, often quoted in a table as the result of the data analysis. In ML, the parameters are too numerous to present this way and wouldn't be useful if they were, since the calculation of predicted values from these parameters is complex. Instead, the ML model is used as a machine to predict $y$ for new $x$ values (prediction) or to randomly generate new $x$, $y$ pairs with the same distribution as the training set (generation). In fact, most ML models don't even have a unique minimum in parameter space—different combinations of parameters would result in the same predictions.
-
-Today, the most accurate and versatile class of ML models are "deep" Neural Networks (NN), where "deep" means having a large number of internal layers. I will describe this type of model in much more detail, since this course will focus exclusively on them. However, it's worth pointing out that NNs are just one type of ML model; others include:
+Today, the most accurate and versatile class of ML models are "deep" Neural Networks (NN), where "deep" means having a large number of internal layers. I will describe this type of model in much more detail, since this course will focus on them. However, it's worth pointing out that NNs are just one type of ML model; others include:
 
 * [Naive Bayes classifiers](https://en.wikipedia.org/wiki/Naive_Bayes_classifier),
 * [k-Nearest Neighbors (kNN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm),
@@ -67,6 +65,6 @@ This course focuses on NNs for several reasons.
 At the end of the course, I want students to
 
 1. understand neural networks at a deep level, to know _why_ they work and therefore _when_ to apply them, and
-2. be sufficiently familiar with the tools and techniques of model-building to be able to start writing code (in PyTorch). The two hour exercise asks students to do exactly this, for a relevant particle physics problem.
+2. be sufficiently familiar with the tools and techniques of model-building to be able to start writing code (in PyTorch). The [Main Project](20-main-project.md) asks students to do exactly this, for a relevant particle physics problem.
 
 In particular, I want to dispel the notion that ML is a black box or a dark art. Like the scientific method itself, ML development requires tinkering and experimentation, but it is possible to debug and it is possible to make informed decisions about it.
diff --git a/deep-learning-intro-for-hep/02-history.md b/deep-learning-intro-for-hep/02-history.md
@@ -1,6 +1,6 @@
 # History of HEP and ML
 
-The purpose of this section is to show that particle physics, specifically High Energy Physics (HEP), has always needed ML. It is not a fad: ML (in a usable form) would have brought major benefits to HEP at any stage in its history. It's being applied to many problems in HEP now because it's just becoming _possible_ now.
+The purpose of this section is to show that particle physics, specifically High Energy Physics (HEP), has always needed ML. It's not a fad: ML (in a usable form) would have brought major benefits to HEP at any stage in its history. It's being applied to many problems in HEP now because it's just becoming _possible_ now.
 
 ![](img/hep-plus-ml.jpg){. width="40%"}
 
@@ -30,7 +30,7 @@ Below, I extended the plot to the present day: the number of events per second h
 
 These event rates have been too fast for humans since the 1970's, when human scanners were replaced by heuristic track-finding routines, usually by hand-written algorithms that iterate through all combinations within plausible windows (which are now a bottleneck in high track densities).
 
-Although many computing tasks in particle physics are suitable for hand-written algorithms, the field also has and has always had tasks that are a natural fit for ML and artificial intelligence, to such an extent that human intelligence was enlisted to solve them. While ML would have been beneficial to HEP from the very beginning of the field, algorithms and computational resources have only recently made it possible.
+Although many computing tasks in particle physics are suitable for hand-written algorithms, the field also has and has always had tasks that are a natural fit for artificial intelligence, to such an extent that human intelligence was enlisted to solve them. While ML would have been beneficial to HEP from the very beginning of the field, algorithms and computational resources have only recently made it possible.
 
 ## Symbolic AI and connectionist AI
 
@@ -45,9 +45,9 @@ Symbolic AI consists of hand-written algorithms, which today wouldn't be called
 
 Symbolic AI is called "symbolic" because the starting point is a system of abstract symbols and rules—like programming in general. An associated philosophical idea is that this is the starting point for intelligence itself: human and artificial intelligence consists in manipulating propositions like an algebra. Mathematician-philosophers like George Boole, Gottlob Frege, and Bertrand Russell formulated these as "the laws of thought".
 
-Connectionist AI makes a weaker claim about what happens in an intelligent system (natural or artificial): only that the system's inputs and outputs are correlated appropriately, and an intricate network of connections can implement that correlation. As we'll see, neural networks are an effective way to implement it, and they were (loosely) inspired by the biology of human brains. The idea that we can only talk about the inputs and outputs of human systems, without proposing symbols as entities in the mind, was a popular trend in psychology called Behaviorism in the first half of the 20th century. Today, Cognitive Psychologists can measure the scaling time and other properties of algorithms in human minds, so Behaviorism is out of favor. But it's ironic that large language models like ChatGPT are an implementation of what Behaviorists proposed as a model of human intelligence a century ago.
+Connectionist AI makes a weaker claim about what happens in an intelligent system (natural or artificial): only that the system's inputs and outputs are correlated appropriately, and an intricate network of connections can implement that correlation. As we'll see, neural networks are an effective way to implement it, and they were (loosely) inspired by the biology of human brains. The idea that we can only talk about the inputs and outputs of human systems, without proposing symbols as entities in the mind, was a popular trend in psychology called Behaviorism in the first half of the 20th century. Today, Cognitive Psychologists can measure the scaling time and other properties of algorithms in human minds, so Behaviorism is out of favor. But it's ironic that large language models like ChatGPT are an implementation of what Behaviorists proposed as a model of human intelligence a century ago!
 
-Although connectionist systems like neural networks don't start with propositions and symbols, something like these structures may form among the connections as the most effective way to produce the desired outputs, similar to emergent behavior in dynamical systems. Practitioners of explainable AI (xAI) try to find patterns like these in trained neural networks—far from treating a trained model as a black box, they treat it as a natural system to study!
+Although connectionist systems like neural networks don't start with propositions and symbols, something like these structures may form among the connections as the most effective way to produce the desired outputs, similar to emergent behavior in dynamical systems. Practitioners of explainable AI (xAI) try to find patterns like these in trained neural networks—far from treating a trained model as a black box, they treat it as a natural system to study.
 
 ## AI's summers and winters
 
@@ -82,13 +82,13 @@ More importantly for us, AI was adopted in experimental particle physics, starti
 
 ![](img/chep-acat-2024-ml.svg){. width="90%"}
 
-Clearly, the physicists' interests are following the developments in computer science, including the latest "winter." From personal experience, I know that Boosted Decision Trees (BDTs) were popular in the gap between the 2000 and 2015, but they rarely appear in CHEP conference talks. (Perhaps they were less exciting?)
+Clearly, the physicists' interests are following the developments in computer science, starting before the most recent winter. From personal experience, I know that Boosted Decision Trees (BDTs) were popular in the gap between the 2000 and 2015, but they rarely appear in CHEP conference talks. (Perhaps they were less exciting?)
 
-The 2024 Nobel Prize in Physics has drawn attention to connections between the theoretical basis of neural networks and statistical mechanics—particularly John Hopfield's memory networks (which have the same mathematical structure as Ising spin-glass systems in physics) and Geoffrey Hinton's application of simulated annealing to training Hopfield and restricted Boltzmann networks (gradually reducing random noise in training to find a more global optimum). However, this connection is distinct from the _application_ of neural networks to solve experimental physics problems—which seemed promising in the 1990's and is yielding significant results today.
+The 2024 Nobel Prize in Physics has drawn attention to connections between the theoretical basis of neural networks and statistical mechanics—particularly John Hopfield's memory networks (which have the same mathematical structure as Ising spin-glass systems in physics) and Geoffrey Hinton's application of simulated annealing to training Hopfield and restricted Boltzmann networks (gradually reducing random noise in training to find a more global optimum). However, this connection is distinct from the _application_ of neural networks to solve experimental physics problems—which is what I'll be focusing on.
 
 ## Conclusion
 
-HEP has always needed ML. Since the beginning of the HEP as we know it, high energy physicists have invested heavily in computing, but their problems could not be solved without human intelligence in the workflow, which doesn't scale to large numbers of collision events. Today, we're finding that many of the hand-written algorithms from the decades in which AI was not ready are less efficient and less capable than connectionist AI solutions, especially deep learning.
+HEP has always needed ML. Since the beginning of HEP as we know it, high energy physicists have invested heavily in computing, but their problems could not be solved without human intelligence in the workflow, which doesn't scale to large numbers of collision events. Today, we're finding that many of the hand-written algorithms from the decades in which AI was not ready are less efficient and less capable than connectionist AI solutions, especially deep learning.
 
 Meanwhile, the prospect of connectionist AI has been unclear until very recently. Interest and funding vacillated throughout its history (including a brief dip in 2020‒2022 ([ref](https://doi.org/10.1109/AIMLA59606.2024.10531477)), before ChatGPT) as hype alternated with pessimism. Given this history, one could find examples to justify either extreme: it has been heavily oversold and undersold.