Skip to content

Commit

Permalink
adding podcasts
Browse files Browse the repository at this point in the history
  • Loading branch information
JacobReynolds committed Mar 21, 2024
1 parent 27426b3 commit 5bdd076
Show file tree
Hide file tree
Showing 4 changed files with 131 additions and 1 deletion.
9 changes: 9 additions & 0 deletions neural-networks/splash-pad/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
title: splash pad
icon: square
description: thoughts to be categorized
---

## Named Entity Recognition (NER)

NER is a common type (feature?) of Natural Language Process (NLP) that can extract named entities from bodies of text.
97 changes: 96 additions & 1 deletion neural-networks/transformers/attention.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ description: listen up, class

Attention, or self-attention, is the mechanism that makes current transformer models so powerful. It gives a neural network the ability to take into account previous values of the context to determine the current values.

## Elementary, my dear Watson
## Simple averaging

### Batched average

A simple example is looking at your current batch and average the embedding values for all tokens that have preceded you.

Expand Down Expand Up @@ -40,6 +42,8 @@ for time in range(T):
xbow[time] = x
```

### Using `torch.tril`

===
And because math is math, we can actually express this as a simple matrix multiplication using `torch.tril`. Using `tril` and then averaging across it makes sure that when we perform the dot product, we're only averaging the previous values up to that row.

Expand Down Expand Up @@ -87,3 +91,94 @@ print(torch.allclose(xbow, c))
```

===

### Using `softmax`

One more alternative is to instead use `softmax`

==- Code example

```python
T, C = 8, 2
torch.manual_seed(1)

x = torch.randn(T, C)
tril = torch.tril(torch.ones(T, T))

wei = torch.zeros((T,T))
wei = wei.masked_fill(tril == 0, -torch.inf)
print(wei)
# tensor([[0., -inf, -inf, -inf, -inf, -inf, -inf, -inf],
# [0., 0., -inf, -inf, -inf, -inf, -inf, -inf],
# [0., 0., 0., -inf, -inf, -inf, -inf, -inf],
# [0., 0., 0., 0., -inf, -inf, -inf, -inf],
# [0., 0., 0., 0., 0., -inf, -inf, -inf],
# [0., 0., 0., 0., 0., 0., -inf, -inf],
# [0., 0., 0., 0., 0., 0., 0., -inf],
# [0., 0., 0., 0., 0., 0., 0., 0.]])

wei = torch.softmax(wei, dim=1)
print(wei)
# tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
# [0.5000, 0.5000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
# [0.3333, 0.3333, 0.3333, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
# [0.2500, 0.2500, 0.2500, 0.2500, 0.0000, 0.0000, 0.0000, 0.0000],
# [0.2000, 0.2000, 0.2000, 0.2000, 0.2000, 0.0000, 0.0000, 0.0000],
# [0.1667, 0.1667, 0.1667, 0.1667, 0.1667, 0.1667, 0.0000, 0.0000],
# [0.1429, 0.1429, 0.1429, 0.1429, 0.1429, 0.1429, 0.1429, 0.0000],
# [0.1250, 0.1250, 0.1250, 0.1250, 0.1250, 0.1250, 0.1250, 0.1250]])

xbow = wei @ x
print(xbow)
# tensor([[-1.5256, -0.7502],
# [-1.0898, -1.1799],
# [-0.7599, -0.9896],
# [-0.8149, -1.1445],
# [-0.7943, -0.8549],
# [-0.7915, -0.7543],
# [-0.7102, -0.4055],
# [-0.5929, -0.2964]])
```

===

## Self-attention

I really don't understand this yet, but here's some of the code and specific notes from Andrej's video.

==- Code example

```python
B, T, C = 4, 8, 32
torch.manual_seed(1)
x = torch.randn(B, T, C)

head_size = 16
key = torch.nn.Linear(C, head_size, bias=False)
query = torch.nn.Linear(C, head_size, bias=False)
value = torch.nn.Linear(C, head_size, bias=False)
k = key(x) # (B, T, 16)
q = query(x) # (B, T, 16)
wei = q @ k.transpose(-2, -1) # (B, T, 16) @ (B, 16, T) ---> (B, T, T)

tril = torch.tril(torch.ones((T, T)))
wei = wei.masked_fill(tril == 0, float(-torch.inf))
wei = torch.nn.functional.softmax(wei, dim=-1)
v = value(x)
out = wei @ v
```

===

- Attention is a communication mechanism. Can be seen as nodes in a directed graph looking at each other and aggregating information with a weighted sum from all nodes that point to them, with data-dependent weights.
- There is no notion of space. Attention simply acts over a set of vectors. This is why we need to positionally encode tokens.
- Each example across batch dimension is of course processed completely independently and never "talk" to each other
- In an "encoder" attention block just delete the single line that does masking with `tril`, allowing all tokens to communicate. This block here is called a "decoder" attention block because it has triangular masking, and is usually used in autoregressive settings, like language modeling.
- "self-attention" just means that the keys and values are produced from the same source as queries. In - "cross-attention", the queries still get produced from x, but the keys and values come from some other, external source (e.g. an encoder module)
- "Scaled" attention additionally divides `wei` by 1/sqrt(head_size). This makes it so when input Q,K are unit variance, wei will be unit variance too and Softmax will stay diffuse and not saturate too much. Illustration below

The final note can be done by doing

```python
wei = q @ k.transpose(-2, -1) * headsize**-.5
```
22 changes: 22 additions & 0 deletions podcast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: podcast
description: notes from different podcasts
icon: unmute
order: 899
---

# Podcast

Various snippets, notes from podcasts that I've listened to that I'd like to remember.

## Business Made Simple with Donald Miller

### #142: Marlee Joseph - The Secret to Making Better Marketing Decisions

Marlee talks about the ICE framework:

- I | Impact
- C | Confidence
- E | Ease

She also mentioned the HPPO framework (highest paid persons' opinion).
4 changes: 4 additions & 0 deletions start-ups/snitch/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,7 @@ A [recent SEC ruling](https://www.sec.gov/news/press-release/2023-139) now requi
Luckily it's the government, and in _some_(ish) cases their data is public. You can search all public SEC filings via [EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch), the Electronic Data Gathering, Analysis, and Retrieval system. You can go even deeper by using their [full-text search](https://www.sec.gov/edgar/search/#) to dive into all the filings.

Unfortunately an 8-K is a common form used to notify investors and the SEC about a variety of things, only one of those items being a data breach. Searching EDGAR for [8-K filings](https://www.sec.gov/edgar/search/#/category=custom&forms=8-K) returns a ton of documents. Mostly about changes in executive staffing or compensation. But thanks to their, likely [lucene](https://lucene.apache.org) backend, we can add a full text match for the words "Item 1.05" and get [more specific results](https://www.sec.gov/edgar/search/#/q=%2522item%25201.05%2522&category=custom&forms=8-K). At the time of writing this, there are 5 results across 3 companies. When I originally performed this research there was only 1 filing. Because of that, I wasn't sure how often new filings would come out, and how they would differ in formats so it wasn't an appropriate time to start writing automation around it. However as more filings get created and we can start to measure their velocity, this could be a fun data feed or subscription service to provide those that like others' dirty cyber laundry.

## Further thoughts

Reading the SEC press release more closely, I now realize it also requires annual filings by all publicly traded companies to make a statement about their cybersecurity processes. Although reading some of them, they all seem to be written by a lawyer and lack any substantial information.

0 comments on commit 5bdd076

Please sign in to comment.