Skip to content

Commit

Permalink
done with convolutional
Browse files Browse the repository at this point in the history
  • Loading branch information
jpivarski committed Dec 5, 2024
1 parent 4e92b98 commit 8668e6e
Show file tree
Hide file tree
Showing 4 changed files with 217 additions and 2 deletions.
219 changes: 217 additions & 2 deletions deep-learning-intro-for-hep/24-convolutional.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
execution:
timeout: 120
---

# Convolutional Neural Networks (CNNs)
Expand Down Expand Up @@ -114,9 +116,9 @@ We can do the same in an artificial neural network by only connecting spatially

Remember that the "connections" are components of a linear transformation from one layer to the next. Excluding connections is equivalent to forcing components of the linear transformation to zero, which reduces the set of parameters that need to be varied.

But apart from eliminating probably-unnecessary calculations, this kind of restriction has meaning for images: it is a [convolution](https://en.wikipedia.org/wiki/Convolution) of the image with a small [kernel](https://en.wikipedia.org/wiki/Kernel_(image_processing)), which is a well-known way to extract higher-level information from images.
But apart from eliminating probably-unnecessary calculations, this kind of restriction has meaning for images: it is a [convolution](https://en.wikipedia.org/wiki/Convolution) of the image with a small [kernel](https://en.wikipedia.org/wiki/Kernel_(image_processing)), which is a well-known way to extract higher-level information from images. That's why a neural network built this way is called a Convolutional Neural Network (CNN).

To demonstrate the action of a kernel, consider this image (taken by me before the CMS experiment was lowered underground):
To demonstrate the action of kernel convolution, consider this image (taken by me before the CMS experiment was lowered underground):

```{code-cell} ipython3
image = mpl.image.imread("data/sun-shines-in-CMS.png")
Expand Down Expand Up @@ -160,6 +162,219 @@ ax.imshow(convolved_image, cmap="gray")
plt.show()
```

Each pixel in the above image is a sum over a product of a 7×7 section of the original image with the 7×7 kernel. Different kernels bring out different higher-level features of the image, such as this one for vertical edges:

```{code-cell} ipython3
fig, ax = plt.subplots()
convolved_image = convolve2d(
# grayscale version of the image (sum over the RGB axis)
np.sum(image, axis=-1),
# https://stackoverflow.com/a/41065243/1623645
np.array([
[-3/18, -3/13, -3/10, -3/9, -3/10, -3/13, -3/18],
[-2/13, -2/8 , -2/5 , -2/4, -2/5 , -2/8 , -2/13],
[-1/10, -1/5 , -1/2 , -1/1, -1/2 , -1/5 , -1/10],
[ 0 , 0 , 0 , 0 , 0 , 0 , 0 ],
[ 1/10, 1/5 , 1/2 , 1/1, 1/2 , 1/5 , 1/10],
[ 2/13, 2/8 , 2/5 , 2/4, 2/5 , 2/8 , 2/13],
[ 3/18, 3/13, 3/10, 3/9, 3/10, 3/13, 3/18],
]),
)
ax.imshow(convolved_image, cmap="gray")
plt.show()
```

We could keep custom-building these kernels to find a variety of features in the image, or we could let them be discovered by the neural network. If we wire the first two layers of the network with 7×7 overlaps (terms in the linear transformation that are _not_ forced to be 0), then the network will fill in the 49 components of this matrix with a kernel that is useful for identifying the image. And not just one kernel, but several, like the two above, which the network will combine in various ways.

Incidentally, a very prominent kernel in the retina of mouse eyes is custom-fitted to identify the shape of hawks ([ref](https://doi.org/10.1073/pnas.1211547109)).

+++

## CNN in PyTorch

+++

To build a CNN in PyTorch, we need to organize the tensor into the axis order that it expects.

```{code-cell} ipython3
jet_images_tensor = torch.tensor(jet_images[:, np.newaxis, :, :], dtype=torch.float32)
jet_labels_tensor = torch.tensor(jet_labels, dtype=torch.int64)
```

PyTorch wants this shape: (number of images, number of channels, height in pixels, width in pixels), so we make the 1 channel explicit with `np.newaxis`.

```{code-cell} ipython3
jet_images_tensor.shape
```

(An RGB image would have 3 channels, and if we use $n$ kernels to make $n$ convolutions of the image, the next layer would have $n$ channels.)

A PyTorch [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) has enough tunable parameters to describe a fixed-size convolution matrix (3×3 below) from a number of input channels (1 below) to a number of output channels (1 below).

The number of parameters _does not_ scale with the size of the image.

```{code-cell} ipython3
list(nn.Conv2d(1, 1, 3).parameters())
```

A PyTorch [nn.MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) scales down an image by a fixed factor, by taking the maximum value in every $n \times n$ block. It has _no_ tunable parameters. Although not strictly necessary, it's a generally useful practice to pool convolutions, to reduce the total number of parameters and sensitivity to noise.

```{code-cell} ipython3
list(nn.MaxPool2d(2).parameters())
```

The general strategy is to reduce the size of the image with each convolution (and max-pooling) while increasing the number of channels, so that the spatial grid gradually becomes an abstract vector.

Then do a normal fully-connected network to classify the vectors. The hardest part is getting all of the indexes right.

```{code-cell} ipython3
class ConvolutionalClassifier(nn.Module):
def __init__(self):
super().__init__() # let PyTorch do its initialization first
self.convolutional1 = nn.Sequential(
nn.Conv2d(1, 5, 5), # 1 input channel → 5 output channels, 5×5 convolution...
nn.ReLU(), # input image: 20×20, convoluted image: 16×16 (because of edges)
nn.MaxPool2d(2), # scales down by taking the max in 2×2 squares, output is 8×8
)
self.convolutional2 = nn.Sequential(
nn.Conv2d(5, 10, 5), # 5 input channels → 10 output channels, 5×5 convolution...
nn.ReLU(), # input image: 8×8, convoluted image: 4×4 (because of edges)
nn.MaxPool2d(2), # scales down by taking the max in 2×2 squares, output is 2×2
)
self.fully_connected = nn.Sequential(
nn.Linear(10 * 2*2, 30),
nn.ReLU(),
nn.Linear(30, 20),
nn.ReLU(),
nn.Linear(20, 10),
nn.ReLU(),
nn.Linear(10, 5),
)
def forward(self, step0):
step1 = self.convolutional1(step0)
step2 = self.convolutional2(step1)
step3 = self.fully_connected(torch.flatten(step2, 1))
return step3
```

Although this has a lot of parameters ($3\,505$), it's less than the number of images ($80\,000$). Thus, this model is not likely to overfit. (If it _does_ [overfit](15-under-overfitting.md), use [regularization](16-regularization.md) to correct it.)

```{code-cell} ipython3
num_model_parameters = 0
for tensor_parameter in ConvolutionalClassifier().parameters():
num_model_parameters += tensor_parameter.detach().numpy().size
num_model_parameters, len(jet_images_tensor)
```

Now we train the model in the usual way.

```{code-cell} ipython3
NUM_EPOCHS = 15
BATCH_SIZE = 1000
model_without_softmax = ConvolutionalClassifier()
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_without_softmax.parameters(), lr=0.03)
loss_vs_epoch = []
for epoch in range(NUM_EPOCHS):
total_loss = 0
for start_batch in range(0, len(jet_images_tensor), BATCH_SIZE):
stop_batch = start_batch + BATCH_SIZE
optimizer.zero_grad()
predictions = model_without_softmax(jet_images_tensor[start_batch:stop_batch])
loss = loss_function(predictions, jet_labels_tensor[start_batch:stop_batch])
total_loss += loss.item()
loss.backward()
optimizer.step()
loss_vs_epoch.append(total_loss)
print(f"{epoch = } {total_loss = }")
```

```{code-cell} ipython3
fig, ax = plt.subplots()
ax.plot(range(len(loss_vs_epoch)), loss_vs_epoch)
ax.set_xlim(-1, len(loss_vs_epoch))
ax.set_ylim(0, np.max(loss_vs_epoch) * 1.1)
ax.set_xlabel("epoch")
ax.set_ylabel("loss")
plt.show()
```

Let's see the accuracy as a confusion matrix.

```{code-cell} ipython3
model_with_softmax = nn.Sequential(
model_without_softmax,
nn.Softmax(dim=1),
)
```

```{code-cell} ipython3
predictions_tensor = model_with_softmax(jet_images_tensor)
confusion_matrix = np.array(
[
[
(predictions_tensor[jet_labels_tensor == true_class].argmax(axis=1) == prediction_class).sum().item()
for prediction_class in range(5)
]
for true_class in range(5)
]
)
confusion_matrix
```

Most images are in the diagonal of the confusion matrix—the predicted category is the actual category—so this CNN is correctly identifying jets!

The biggest confusion (off-diagonal terms) is between gluons and light quarks and between $W$ and $Z$ bosons, as expected.

```{code-cell} ipython3
fig, ax = plt.subplots(figsize=(6, 6))
image = ax.imshow(confusion_matrix, vmin=0)
fig.colorbar(image, ax=ax, label="number of test samples", shrink=0.8)
ax.set_xticks(range(5), jet_label_order)
ax.set_yticks(range(5), jet_label_order)
ax.set_xlabel("predicted jet category")
ax.set_ylabel("true jet category")
plt.show()
```

## Discussion

+++

The first layer mostly uses kernels to find edges and other small-scale patterns, and the next layer combines these small patterns into larger patterns. You can add more convolution layers to build up larger and larger scale spatial patterns before handing them off to the fully connected network to do generic (non-spatial) classification. The idea is that the network builds from low-level features to high-level features with each layer—which is why _deep_ neural networks are so successful.

![](img/DWTBQ2-2018-ev-fig3.jpg){. width="100%"}

This idea was already proposed in the paper that introduced CNNs in 1980 ([ref](https://doi.org/10.1007/BF00344251)):

![](img/higher-order-features.png){. width="100%"}

The "[grandmother cell](https://en.wikipedia.org/wiki/Grandmother_cell)" refers to an old hypothesis that, somewhere in the human brain, _one cell_ encodes a very high-level concept like "grandmother." (Or it could have been meant as hyperbole!) However, when a large unsupervised image clustering network was built at Google, trained on YouTube videos ([ref](https://dl.acm.org/doi/10.5555/3042573.3042641)), one of the neurons projected back onto image space like this:

![](img/cat-neuron.png){. width="50%"}

Apparently, there _is_ a cat cell.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added deep-learning-intro-for-hep/img/cat-neuron.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8668e6e

Please sign in to comment.