-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
103 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Bookmark | ||
|
||
In no particular order, some things I've found interesting or that i simply enjoy. | ||
|
||
- [Universe Today](https://www.universetoday.com/) website about space and astronomy | ||
- [Universe Today YouTube channel](https://www.youtube.com/@frasercain) (same as above but in video/podcast format) | ||
- [On being a Hydra with, and without, a nervous system: what do neurons add?](https://link.springer.com/article/10.1007/s10071-023-01816-8) | ||
- [Apollo Guidance Computer Restoration](https://www.youtube.com/playlist?list=PL-_93BVApb59FWrLZfdlisi_x7-Ut_-w7) (YouTube playlist from CuriousMarc) | ||
- [Mechanical calculators](https://www.youtube.com/playlist?list=PL-_93BVApb58cdHy3Z2sUWtd6q2LsmO2Z) (YouTube playlist from CuriousMarc again) and, yes, I own a few of them. | ||
- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# LLM on SSD | ||
|
||
I simply have some kind of general feeling that a LLM may not absolutely need a large amount of fast memory to run. | ||
|
||
There is a [paper about it](https://arxiv.org/abs/2312.11514) named "LLM in Flash". | ||
I need to read it first. | ||
|
||
> Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory, but bringing them on demand to DRAM. Our method involves constructing an inference cost model that takes into account the characteristics of flash memory, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this hardware-informed framework, we introduce two principal techniques. First, "windowing" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory. | ||
|
||
To be honest, I expected more than "up to twice the size of the available DRAM". | ||
What about 10x ? 100x ? What the point of using a LLM if you can't use a large one ? | ||
|
||
The 4-5x and 20-25x increase in inference speed is interesting though. But not the point. | ||
|
||
## The cost of running a LLM | ||
|
||
LLM can't be forever limited by memory and can't always be on large, expensive, cloud servers. | ||
The public don't understand the insane ecological and economical cost of running a LLM. | ||
It really is a shame to use a LLM for simple requests like "what is the weather today ?" or "what is the capital of France ?". | ||
|
||
And yes, i'm aware of how ironic it is to use a LLM to write about the ecological cost of using a LLM. | ||
I'm not sure if it's funny or sad. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# On using AI to write about writing | ||
|
||
One thing is for sure, i'm using GitHub Copilot and it's massively helpful. | ||
But it's kind of weird as well as it influence the way I write. | ||
|
||
It make me write thing | ||
|
||
![write_thing.png](write_thing.png) | ||
|
||
But it also write things i would have written anyway. Which is even weirder. | ||
Am i that predictable ? Or is it just that good ? | ||
|
||
![predicatable.png](predicatable.png) | ||
|
||
Am i predictable because i'm using it and it's influencing the way I write ? | ||
It is, of course, "that good" and i'm not worried about this. | ||
I'm more worried about the fact that it's influencing the way I write. | ||
|
||
![worthy.png](worthy.png) | ||
|
||
Talk about weird. [It's like a feedback loop.](Web-Enshitification.md) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters