Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Gemma 7B 2B OSS models are available on Hugging Face as of 20240221 #13

Open
obriensystems opened this issue Feb 22, 2024 · 22 comments
Assignees

Comments

@obriensystems
Copy link
Member

obriensystems commented Feb 22, 2024

see #27
https://ai.google.dev/gemma/docs?hl=en
https://www.kaggle.com/models/google/gemma

Gemma on Vertex AI Model garden
https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?_ga=2.34476193.-1036776313.1707424880&hl=en

https://obrienlabs.medium.com/google-gemma-7b-and-2b-llm-models-are-now-available-to-developers-as-oss-on-hugging-face-737f65688f0d

https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf
https://blog.google/technology/developers/gemma-open-models/

https://huggingface.co/google/gemma-7b
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/l4/PB-11316-001_v01.pdf

pull and remake the latest llama.cpp (see previous article running llama 70b - #7

abetlen/llama-cpp-python#1207
ggerganov/llama.cpp@580111d
Screenshot 2024-02-21 at 22 49 34

7B (32G model needs 64G on a CPU or a RTX-A6000/RTX-5000 Ada) and 2B (on a macbook M1Max:32G unified ram - working perfectly

obrien@mbp7 llama.cpp % ./main -m models/gemma-2b.gguf -p "Describe how gold is made in collapsing stars" -t 24 -n 1000 -e --color 
Log start
main: build = 2234 (973053d8)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.3.0
main: seed  = 1708573311
llama_model_loader: loaded meta data with 19 key-value pairs and 164 tensors from models/gemma-2b.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
...
ggml_backend_metal_buffer_from_ptr: allocated buffer, size =  9561.31 MiB, ( 9561.38 / 21845.34)
llm_load_tensors: offloading 18 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 19/19 layers to GPU
llm_load_tensors:      Metal buffer size =  9561.30 MiB
llm_load_tensors:        CPU buffer size =  2001.00 MiB
.............................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil


llama_print_timings:        load time =   10956.18 ms
llama_print_timings:      sample time =     650.20 ms /  1000 runs   (    0.65 ms per token,  1537.98 tokens per second)
llama_print_timings: prompt eval time =      55.43 ms /     9 tokens (    6.16 ms per token,   162.36 tokens per second)
llama_print_timings:        eval time =   32141.38 ms /   999 runs   (   32.17 ms per token,    31.08 tokens per second)
llama_print_timings:       total time =   33773.63 ms /  1008 tokens
ggml_metal_free: deallocating

https://cloud.google.com/blog/products/ai-machine-learning/performance-deepdive-of-gemma-on-google-cloud

@obriensystems
Copy link
Member Author

i9-13900KS running dual RTX-A4500 (20+20G) Ampere and i9-14900K running dual RTX-4090 (24+24G) Ada

CPU first

C:/wse_github/llama.cpp $  ./main.exe -m g:/models/gemma-7b.gguf  -p "what partion of gold is made in exploding stars" -n 2000 -e --color -t 24

Log start
main: build = 2234 (973053d8)
main: built with cc (GCC) 13.2.0 for x86_64-w64-mingw32
main: seed  = 1708573388
llama_model_loader: loaded meta data with 19 key-value pairs and 254 tensors from g:/models/gemma-7b.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma
llama_model_loader: - kv   1:                               general.name str              = gemma-7b
llama_model_loader: - kv   2:                       gemma.context_length u32              = 8192
llama_model_loader: - kv   3:                          gemma.block_count u32              = 28
llama_model_loader: - kv   4:                     gemma.embedding_length u32              = 3072
llama_model_loader: - kv   5:                  gemma.feed_forward_length u32              = 24576
llama_model_loader: - kv   6:                 gemma.attention.head_count u32              = 16
llama_model_loader: - kv   7:              gemma.attention.head_count_kv u32              = 16
llama_model_loader: - kv   8:                 gemma.attention.key_length u32              = 256
llama_model_loader: - kv   9:               gemma.attention.value_length u32              = 256
llama_model_loader: - kv  10:     gemma.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  14:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  15:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,256128]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,256128]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,256128]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - type  f32:  254 tensors
llm_load_vocab: mismatch in special tokens definition ( 544/256128 vs 388/256128 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 256128
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_layer          = 28
llm_load_print_meta: n_rot            = 192
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 24576
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = all F32 (guessed)
llm_load_print_meta: model params     = 8.54 B
llm_load_print_meta: model size       = 31.81 GiB (32.00 BPW)
llm_load_print_meta: general.name     = gemma-7b
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 227 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.10 MiB
llm_load_tensors:        CPU buffer size = 32570.17 MiB
......................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   224.00 MiB
llama_new_context_with_model: KV self size  =  224.00 MiB, K (f16):  112.00 MiB, V (f16):  112.00 MiB
llama_new_context_with_model:        CPU input buffer size   =     8.01 MiB
llama_new_context_with_model:        CPU compute buffer size =   506.25 MiB
llama_new_context_with_model: graph splits (measure): 1

system_info: n_threads = 24 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 512, n_predict = 2000, n_keep = 1


 what partion of gold is made in exploding stars.

We have seen that there are very strong indications, both from meteoritic data and terrestrial rocks (see Chapter 8), that the Earth has experienced a bombardment by large bodies (planetesimals) during the first few million years after its formation $460 \times$ $\left(1-\delta_{\mathrm{E}}\right)$ ago. These planetensimals were largely made of silicates and iron, but they contained less metal than is found in meteorites today: their meteoritic equivalent was the enstatite chondrite with a low amount ( $9 \%$ by mass) metallic FeNi alloy; this type has been identified from both lunar rock fragments and martian samples. Thus these projectiles were largely made of silicates, but they contained 15\% metal compared to $\sim 30-42$ vol.\% for the present terrestrial core: therefore their impact on a young Earth must have produced an important amount (at least $~ \frac{8}{6}$ ) metallic FeNi alloy.

The total mass of these projectiles, which is estimated from both lunar rocks and martian samples to be $\sim 05$ M $_{E}$, implies that the fraction ejected into space by their impact was between a half $(1 /:)$ or two thirds (2/3) depending on the initial amount in FeNi alloy. Thus our Moon represents $7 \%$ of these projectiles, while Mars may have formed as much as one third $(\sim 4 \%)$. This last figure is to be compared with that found for planetary formation by dynamical processes which indicates a mass ratio between Earth and its moon at most equal

<h1>CHAPTER $\mathrm{X</h1>
to unity. The fraction ejected into space in FeNi alloy was therefore of the order $02(1 / 3)$, i e., it represents about half $(\sim \delta)$ the total amount of metallic iron, or an average abundance for meteoritic chondrites (see Table X-5). It is interesting to note that this fraction is very similar both to what we have estimated in Chapter VIII from present terrestrial core data and also from meteorites. Thus it appears more probable that a large part of the Earth's metallic FeNi alloy was formed by impact, rather than due solely or mainly because of differentiation during planetary formation (see Fig 10-2).

The other half $(\sim \delta)$ is probably made in stellar interiors where we have seen above the Moon. at as: to to to to to to to once more than $45 on both by by over Earth's surface,,,,,,,,,,,,,,,,, but but but when it came be expected from during its formation is is may indicate that purely theoretical reason















 once. discretely and stimated to to to to to to to to in in in in in in in in in in in once more than 554 on on over Earth's a fine dust, mass the size of once they are small at any amount about $234 and as formed from above one- purely theoretical estimate is forこの日 twice its possible that hereto form $\quad$ formation. strictly speaking which was first very much larger in under an extremely large estimated fraction

 on over Earth's slightly greater than a mass may have been the average once or at at any more of just barely bigger $0 within 1 to to amongst it seems highly about one another small amount formed during its possible origin for each and (a great deal less is not yet further our very much larger. Thus far above all $\quad$ strictly speaking as we still greater than most probably in the average while a mes quite big. once a mere theory ofRDONLY more... an an example, thereunder weight half that hereto form 1mipmap or on favourably huge new super- once andANNES ( over its structure size thatpelier one extremely heavy duty earth large beyond where it is almost double how very much larger than life itself the outside world above all but at (()one forividuated several times more slightly heavier.この日, thereunder $\left(((( ( ( ( ( ( ( ( ( ( ( ( ( once again highly probably a just half- purely because ofcreateServer size one and hereto form earth large beyond which had it is mesquite the almost double duty above all mass that within its structure weight very much larger than life itself outside world over twice more heavy slightly heavier. thereunder, (()one was formed somewhat smaller at $0 on once again highly probably greater in forming just about a half- once more heavily inside out whose outer size one greatly beyond which might be the almost double duty earth quite large above all mass formation weight within its structure very much larger than life itself outside world would have (somehow it is slightly heavier. The amount of discretely under(and somewhat formed on had twice more heavy that was in from $\quad$ some amount greater below average (that it is just occurred halfway through and once but not, meselfromore the release . [end of text]
llama_print_timings:        load time =    4247.23 ms
llama_print_timings:      sample time =     859.49 ms /  1030 runs   (    0.83 ms per token,  1198.38 tokens per second)
llama_print_timings: prompt eval time =     693.44 ms /    11 tokens (   63.04 ms per token,    15.86 tokens per second)
llama_print_timings:        eval time =  412311.19 ms /  1029 runs   (  400.69 ms per token,     2.50 tokens per second)
llama_print_timings:       total time =  415226.57 ms /  1040 tokens
Log end

```

@obriensystems
Copy link
Member Author

image

@obriensystems
Copy link
Member Author

Screenshot 2024-02-21 at 23 06 28

@obriensystems obriensystems changed the title Google Gemma 7B 2B OSS models are available as of 20240221 Google Gemma 7B 2B OSS models are available on Hugging Face as of 20240221 Feb 22, 2024
@obriensystems
Copy link
Member Author

obriensystems commented Feb 22, 2024

Team, thank you for integrating Gemma support into llama.cpp yesterday - this was an extremely fast and efficient alignment with a model that just came out a couple hours before.
I personally am very grateful to your efforts.
A wider community thank you is in order
thank you for
ggerganov/llama.cpp#5631

@fmichaelobrien
Copy link
Member

@obriensystems
Copy link
Member Author

investigate TensorFlow 2 / keras support
https://developers.googleblog.com/2024/02/gemma-models-in-keras.html
https://ai.google.dev/gemma/docs/distributed_tuning

pip install -U torch
pip install -U transformers

@obriensystems
Copy link
Member Author

obriensystems commented Feb 24, 2024

gemma-7b model on dual RTX-4090 suprim liquid with 800W max and 2 x 24G = 48G vram

The model runs at 20% TDP or 100+100W because I am sharing the model across the PCIe bus at 8x which saturates it up to 75% at 8 x 2GBps or = 12GBps as opposed to NVlink on ampere cards at 112GBps

checking context length
Using the model-agnostic default max_length (=20) to control the generation length

outputs = model.generate(**input_ids, max_new_tokens=1000)

from transformers import AutoTokenizer, AutoModelForCausalLM

access_token='hf_cfT...QqH'

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b", token=access_token)
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto", token=access_token)

input_text = "how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=3000)
print(tokenizer.decode(outputs[0]))

image
image

python pip summary

332  cd machine-learning/
  335  mkdir gemma
  337  vi gemma-cpu.py
  339  pip install -U transformers
  352  pip install -U torch
  353  python gemma-cpu.py
  355  nvcc --version
  364  pip install accelerate
  366  pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  368  python gemma-gpu.py

run

michael@13900b MINGW64 /c/wse_github/obrienlabsdev/machine-learning/gemma (main)
$ python gemma-gpu.py
Loading checkpoint shards: 100%|████████████████████████████████████| 4/4 [00:06<00:00,  1.72s/it]
C:\Users\michael\AppData\Roaming\Python\Python311\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>

@obriensystems
Copy link
Member Author

image

@obriensystems
Copy link
Member Author

obriensystems commented Feb 25, 2024

7b testing on CUDA 12.3 on dual NVidia RTX-A4500 Ampere with NVLink

7b testing on CUDA 12.3 on dual RTX-4090 Ampere MSI liquid 24Gx2 without NVLink - on PCIeX8

2b testing on CUDA 12.3 on RTX-A4000 Ampere desk 16G

michael@14900c MINGW64 ~
$ cd /c/wse_github/ObrienlabsDev/machine-learning/

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning (main)
$ pip install -U transformers
Collecting transformers
  Downloading transformers-4.38.1-py3-none-any.whl.metadata (131 kB)
     -------------------------------------- 131.1/131.1 kB 1.5 MB/s eta 0:00:00
Collecting filelock (from transformers)
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting huggingface-hub<1.0,>=0.19.3 (from transformers)
  Downloading huggingface_hub-0.20.3-py3-none-any.whl.metadata (12 kB)
Collecting numpy>=1.17 (from transformers)
  Downloading numpy-1.26.4-cp312-cp312-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 61.0/61.0 kB 3.4 MB/s eta 0:00:00
Collecting packaging>=20.0 (from transformers)
  Downloading packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pyyaml>=5.1 (from transformers)
  Downloading PyYAML-6.0.1-cp312-cp312-win_amd64.whl.metadata (2.1 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2023.12.25-cp312-cp312-win_amd64.whl.metadata (41 kB)
     ---------------------------------------- 42.0/42.0 kB ? eta 0:00:00
Collecting requests (from transformers)
  Downloading requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting tokenizers<0.19,>=0.14 (from transformers)
  Downloading tokenizers-0.15.2-cp312-none-win_amd64.whl.metadata (6.8 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.4.2-cp312-none-win_amd64.whl.metadata (3.9 kB)
Collecting tqdm>=4.27 (from transformers)
  Downloading tqdm-4.66.2-py3-none-any.whl.metadata (57 kB)
     ---------------------------------------- 57.6/57.6 kB 3.2 MB/s eta 0:00:00
Collecting fsspec>=2023.5.0 (from huggingface-hub<1.0,>=0.19.3->transformers)
  Downloading fsspec-2024.2.0-py3-none-any.whl.metadata (6.8 kB)
Collecting typing-extensions>=3.7.4.3 (from huggingface-hub<1.0,>=0.19.3->transformers)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting colorama (from tqdm>=4.27->transformers)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting charset-normalizer<4,>=2 (from requests->transformers)
  Downloading charset_normalizer-3.3.2-cp312-cp312-win_amd64.whl.metadata (34 kB)
Collecting idna<4,>=2.5 (from requests->transformers)
  Downloading idna-3.6-py3-none-any.whl.metadata (9.9 kB)
Collecting urllib3<3,>=1.21.1 (from requests->transformers)
  Downloading urllib3-2.2.1-py3-none-any.whl.metadata (6.4 kB)
Collecting certifi>=2017.4.17 (from requests->transformers)
  Downloading certifi-2024.2.2-py3-none-any.whl.metadata (2.2 kB)
Downloading transformers-4.38.1-py3-none-any.whl (8.5 MB)
   ---------------------------------------- 8.5/8.5 MB 10.7 MB/s eta 0:00:00
Downloading huggingface_hub-0.20.3-py3-none-any.whl (330 kB)
   --------------------------------------- 330.1/330.1 kB 20.0 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp312-cp312-win_amd64.whl (15.5 MB)
   ---------------------------------------- 15.5/15.5 MB 32.8 MB/s eta 0:00:00
Downloading packaging-23.2-py3-none-any.whl (53 kB)
   ---------------------------------------- 53.0/53.0 kB 2.7 MB/s eta 0:00:00
Downloading PyYAML-6.0.1-cp312-cp312-win_amd64.whl (138 kB)
   ---------------------------------------- 138.7/138.7 kB 8.0 MB/s eta 0:00:00
Downloading regex-2023.12.25-cp312-cp312-win_amd64.whl (268 kB)
   ---------------------------------------- 268.9/268.9 kB ? eta 0:00:00
Downloading safetensors-0.4.2-cp312-none-win_amd64.whl (270 kB)
   ---------------------------------------- 270.7/270.7 kB ? eta 0:00:00
Downloading tokenizers-0.15.2-cp312-none-win_amd64.whl (2.2 MB)
   ---------------------------------------- 2.2/2.2 MB 46.2 MB/s eta 0:00:00
Downloading tqdm-4.66.2-py3-none-any.whl (78 kB)
   ---------------------------------------- 78.3/78.3 kB 4.3 MB/s eta 0:00:00
Downloading filelock-3.13.1-py3-none-any.whl (11 kB)
Downloading requests-2.31.0-py3-none-any.whl (62 kB)
   ---------------------------------------- 62.6/62.6 kB ? eta 0:00:00
Downloading certifi-2024.2.2-py3-none-any.whl (163 kB)
   ---------------------------------------- 163.8/163.8 kB 9.6 MB/s eta 0:00:00
Downloading charset_normalizer-3.3.2-cp312-cp312-win_amd64.whl (100 kB)
   ---------------------------------------- 100.4/100.4 kB ? eta 0:00:00
Downloading fsspec-2024.2.0-py3-none-any.whl (170 kB)
   ---------------------------------------- 170.9/170.9 kB ? eta 0:00:00
Downloading idna-3.6-py3-none-any.whl (61 kB)
   ---------------------------------------- 61.6/61.6 kB 3.4 MB/s eta 0:00:00
Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Downloading urllib3-2.2.1-py3-none-any.whl (121 kB)
   ---------------------------------------- 121.1/121.1 kB 6.9 MB/s eta 0:00:00
Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: urllib3, typing-extensions, safetensors, regex, pyyaml, packaging, numpy, idna, fsspec, filelock, colorama, charset-normalizer, certifi, tqdm, requests, huggingface-hub, tokenizers, transformers
Successfully installed certifi-2024.2.2 charset-normalizer-3.3.2 colorama-0.4.6 filelock-3.13.1 fsspec-2024.2.0 huggingface-hub-0.20.3 idna-3.6 numpy-1.26.4 packaging-23.2 pyyaml-6.0.1 regex-2023.12.25 requests-2.31.0 safetensors-0.4.2 tokenizers-0.15.2 tqdm-4.66.2 transformers-4.38.1 typing-extensions-4.9.0 urllib3-2.2.1

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning (main)
$ pip install -U torch
Collecting torch
  Downloading torch-2.2.1-cp312-cp312-win_amd64.whl.metadata (26 kB)
Requirement already satisfied: filelock in c:\optpython312\lib\site-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\optpython312\lib\site-packages (from torch) (4.9.0)
Collecting sympy (from torch)
  Downloading sympy-1.12-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch)
  Downloading networkx-3.2.1-py3-none-any.whl.metadata (5.2 kB)
Collecting jinja2 (from torch)
  Downloading Jinja2-3.1.3-py3-none-any.whl.metadata (3.3 kB)
Requirement already satisfied: fsspec in c:\optpython312\lib\site-packages (from torch) (2024.2.0)
Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl.metadata (3.1 kB)
Collecting mpmath>=0.19 (from sympy->torch)
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Downloading torch-2.2.1-cp312-cp312-win_amd64.whl (198.5 MB)
   --------------------------------------- 198.5/198.5 MB 46.9 MB/s eta 0:00:00
Downloading Jinja2-3.1.3-py3-none-any.whl (133 kB)
   ---------------------------------------- 133.2/133.2 kB ? eta 0:00:00
Downloading networkx-3.2.1-py3-none-any.whl (1.6 MB)
   ---------------------------------------- 1.6/1.6 MB 102.3 MB/s eta 0:00:00
Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
   ---------------------------------------- 5.7/5.7 MB 122.0 MB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl (17 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
   ---------------------------------------- 536.2/536.2 kB ? eta 0:00:00
Installing collected packages: mpmath, sympy, networkx, MarkupSafe, jinja2, torch
Successfully installed MarkupSafe-2.1.5 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 sympy-1.12 torch-2.2.1

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning (main)
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:51:05_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning (main)
$ pip install accelerate
Collecting accelerate
  Downloading accelerate-0.27.2-py3-none-any.whl.metadata (18 kB)
Requirement already satisfied: numpy>=1.17 in c:\optpython312\lib\site-packages (from accelerate) (1.26.4)
Requirement already satisfied: packaging>=20.0 in c:\optpython312\lib\site-packages (from accelerate) (23.2)
Collecting psutil (from accelerate)
  Downloading psutil-5.9.8-cp37-abi3-win_amd64.whl.metadata (22 kB)
Requirement already satisfied: pyyaml in c:\optpython312\lib\site-packages (from accelerate) (6.0.1)
Requirement already satisfied: torch>=1.10.0 in c:\optpython312\lib\site-packages (from accelerate) (2.2.1)
Requirement already satisfied: huggingface-hub in c:\optpython312\lib\site-packages (from accelerate) (0.20.3)
Requirement already satisfied: safetensors>=0.3.1 in c:\optpython312\lib\site-packages (from accelerate) (0.4.2)
Requirement already satisfied: filelock in c:\optpython312\lib\site-packages (from torch>=1.10.0->accelerate) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\optpython312\lib\site-packages (from torch>=1.10.0->accelerate) (4.9.0)
Requirement already satisfied: sympy in c:\optpython312\lib\site-packages (from torch>=1.10.0->accelerate) (1.12)
Requirement already satisfied: networkx in c:\optpython312\lib\site-packages (from torch>=1.10.0->accelerate) (3.2.1)
Requirement already satisfied: jinja2 in c:\optpython312\lib\site-packages (from torch>=1.10.0->accelerate) (3.1.3)
Requirement already satisfied: fsspec in c:\optpython312\lib\site-packages (from torch>=1.10.0->accelerate) (2024.2.0)
Requirement already satisfied: requests in c:\optpython312\lib\site-packages (from huggingface-hub->accelerate) (2.31.0)
Requirement already satisfied: tqdm>=4.42.1 in c:\optpython312\lib\site-packages (from huggingface-hub->accelerate) (4.66.2)
Requirement already satisfied: colorama in c:\optpython312\lib\site-packages (from tqdm>=4.42.1->huggingface-hub->accelerate) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\optpython312\lib\site-packages (from jinja2->torch>=1.10.0->accelerate) (2.1.5)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\optpython312\lib\site-packages (from requests->huggingface-hub->accelerate) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\optpython312\lib\site-packages (from requests->huggingface-hub->accelerate) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\optpython312\lib\site-packages (from requests->huggingface-hub->accelerate) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\optpython312\lib\site-packages (from requests->huggingface-hub->accelerate) (2024.2.2)
Requirement already satisfied: mpmath>=0.19 in c:\optpython312\lib\site-packages (from sympy->torch>=1.10.0->accelerate) (1.3.0)
Downloading accelerate-0.27.2-py3-none-any.whl (279 kB)
   ---------------------------------------- 280.0/280.0 kB 2.2 MB/s eta 0:00:00
Downloading psutil-5.9.8-cp37-abi3-win_amd64.whl (255 kB)
   ---------------------------------------- 255.1/255.1 kB 2.6 MB/s eta 0:00:00
Installing collected packages: psutil, accelerate
Successfully installed accelerate-0.27.2 psutil-5.9.8

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning (main)
$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121     Looking in indexes: https://download.pytorch.org/whl/cu121
Requirement already satisfied: torch in c:\optpython312\lib\site-packages (2.2.1)
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.17.1%2Bcu121-cp312-cp312-win_amd64.whl (5.7 MB)
     ---------------------------------------- 5.7/5.7 MB 25.8 MB/s eta 0:00:00
Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.2.1%2Bcu121-cp312-cp312-win_amd64.whl (4.0 MB)
     ---------------------------------------- 4.0/4.0 MB 87.8 MB/s eta 0:00:00
Requirement already satisfied: filelock in c:\optpython312\lib\site-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\optpython312\lib\site-packages (from torch) (4.9.0)
Requirement already satisfied: sympy in c:\optpython312\lib\site-packages (from torch) (1.12)
Requirement already satisfied: networkx in c:\optpython312\lib\site-packages (from torch) (3.2.1)
Requirement already satisfied: jinja2 in c:\optpython312\lib\site-packages (from torch) (3.1.3)
Requirement already satisfied: fsspec in c:\optpython312\lib\site-packages (from torch) (2024.2.0)
Requirement already satisfied: numpy in c:\optpython312\lib\site-packages (from torchvision) (1.26.4)
Collecting torch
  Downloading https://download.pytorch.org/whl/cu121/torch-2.2.1%2Bcu121-cp312-cp312-win_amd64.whl (2454.8 MB)
     ---------------------------------------- 2.5/2.5 GB 6.1 MB/s eta 0:00:00
Collecting pillow!=8.3.*,>=5.3.0 (from torchvision)
  Downloading https://download.pytorch.org/whl/pillow-10.2.0-cp312-cp312-win_amd64.whl (2.6 MB)
     ---------------------------------------- 2.6/2.6 MB 84.2 MB/s eta 0:00:00
Requirement already satisfied: MarkupSafe>=2.0 in c:\optpython312\lib\site-packages (from jinja2->torch) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in c:\optpython312\lib\site-packages (from sympy->torch) (1.3.0)
Installing collected packages: pillow, torch, torchvision, torchaudio
  Attempting uninstall: torch
    Found existing installation: torch 2.2.1
    Uninstalling torch-2.2.1:
      Successfully uninstalled torch-2.2.1
Successfully installed pillow-10.2.0 torch-2.2.1+cu121 torchaudio-2.2.1+cu121 torchvision-0.17.1+cu121


michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
tokenizer_config.json: 100%|##########| 1.11k/1.11k [00:00<00:00, 2.21MB/s]
C:\optpython312\Lib\site-packages\huggingface_hub\file_download.py:149: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\michael\.cache\huggingface\hub\models--google--gemma-2b. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
tokenizer.model: 100%|##########| 4.24M/4.24M [00:00<00:00, 24.0MB/s]
tokenizer.json: 100%|##########| 17.5M/17.5M [00:00<00:00, 98.7MB/s]
special_tokens_map.json: 100%|##########| 555/555 [00:00<?, ?B/s]
config.json: 100%|##########| 627/627 [00:00<?, ?B/s]
model.safetensors.index.json: 100%|##########| 13.5k/13.5k [00:00<00:00, 27.0MB/s]
model-00001-of-00002.safetensors: 100%|##########| 4.95G/4.95G [00:44<00:00, 112MB/s]
model-00002-of-00002.safetensors: 100%|##########| 67.1M/67.1M [00:00<00:00, 114MB/s]
Downloading shards: 100%|##########| 2/2 [00:45<00:00, 22.52s/it]0:00<00:00, 115MB/s]
Loading checkpoint shards: 100%|##########| 2/2 [00:02<00:00,  1.05s/it]
generation_config.json: 100%|##########| 137/137 [00:00<?, ?B/s]
C:\optpython312\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
genarate start:  08:36:18
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 08:36:32

image

rerun
image

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Loading checkpoint shards: 100%|##########| 2/2 [00:01<00:00,  1.09it/s]
C:\optpython312\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
genarate start:  08:38:53
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 08:39:07

2b testing on CUDA 12.3 on RTX-5000 Turing mobile 16G

2b testing on CUDA 12.3 on RTX-3500 ADA mobile 12G - cold

image

generate start:  18:48:21
end 18:48:35

2b testing on CUDA 12.3 on RTX-3500 ADA mobile 12G - thermal throttling

image

micha@p1gen6 MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Loading checkpoint shards: 100%|##########| 2/2 [00:02<00:00,  1.07s/it]
C:\opt\python312\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
genarate start:  17:57:19
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 17:57:33


2b testing on Metal 2 on M2 Pro 16G

2b testing on Metal 2 on M1 Max 32G

2b testing on CPU 13800H 65G mobile Lenovo P1Gen6 - with thermal throttling

image

from transformers import AutoTokenizer, AutoModelForCausalLM
from datetime import datetime

access_token='hf_cfTP...XCQqH'
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", token=access_token)
# GPU
#model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", token=access_token)
# CPU
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b",token=access_token)

input_text = "how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process."
time_start = datetime.now().strftime("%H:%M:%S")
print("genarate start: ", datetime.now().strftime("%H:%M:%S"))

# GPU
#input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# CPU
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, 
                         max_new_tokens=10000)
print(tokenizer.decode(outputs[0]))

print("end", datetime.now().strftime("%H:%M:%S"))
time_end = datetime.now().strftime("%H:%M:%S")

micha@p1gen6 MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Loading checkpoint shards: 100%|##########| 2/2 [00:01<00:00,  1.24it/s]
genarate start:  17:48:58
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 17:50:40

7b testing on CPU 13800H 65G mobile Lenovo P1Gen6 - with thermal throttling

image

generate srt:  18:57:24
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
generate end:  19:00:11

@obriensystems
Copy link
Member Author

obriensystems commented Feb 25, 2024

L4 on GCP running gemma 7b

cached model on

C:\Users\michael\.cache\huggingface\hub\models--google--gemma-7b\blobs

Running on G2

 --machine-type=g2-standard-24 
 --accelerator=count=2,type=nvidia-l4-vws
 --image=projects/nvidia-vgpu-public/global/images/nv-windows-server-2022-vws-536-25-v202306270722

60/69 saturation
Screenshot 2024-02-25 at 17 22 08

Screenshot 2024-02-25 at 17 23 37 Screenshot 2024-02-25 at 17 24 48
PS C:\Users\michael> Invoke-WebRequest -Uri "https://www.python.org/ftp/python/3.10.2/python-3.10.2-amd64.exe" -OutFile "python-3.10.2-amd64.exe"
PS C:\Users\michael> .\python-3.10.2-amd64.exe /quiet InstallAllUsers=1 PrependPath=1 Include_test=0
PS C:\Users\michael> python --version

PS C:\Users\michael> nvidia-smi
Sun Feb 25 21:52:51 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 536.25                 Driver Version: 536.25       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                    WDDM  | 00000000:00:03.0 Off |                    0 |
| N/A   48C    P8              14W /  72W |    237MiB / 23034MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                    WDDM  | 00000000:00:04.0 Off |                    0 |
| N/A   45C    P8              12W /  72W |      0MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+


Screenshot 2024-02-25 at 16 57 45 Screenshot 2024-02-25 at 16 58 50

Finops

24 vCPU + 96 GB memory $640.84
2 NVIDIA L4 $814.98
NVIDIA GRID license fee $292.00
Premium image usage fee* Unknown
50 GB balanced persistent disk $5.50
Total $1,753.33

Finish installing software - or just use a docker container

   4 pip install -U transformers
   5 pip install -U torch
   6 pip install accelerate
   7 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Download code
https://github.com/ObrienlabsDev/machine-learning
https://github.com/ObrienlabsDev/machine-learning/archive/refs/heads/main.zip

extract zip, add hugging face token

PS C:\Windows\system32> cd C:\wse_github\machine-learning\environments\windows\src\google-gemma\
PS C:\wse_github\machine-learning\environments\windows\src\google-gemma> dir
    Directory: C:\wse_github\machine-learning\environments\windows\src\google-gemma
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2/25/2024  10:04 PM            861 gemma-gpu.py

Run the model - download it first at 3.5 Gbps
Screenshot 2024-02-25 at 17 06 36

gpu is throttled by either NVlink or straight PCIe

PS C:\Windows\system32> cd C:\wse_github\machine-learning\environments\windows\src\google-gemma\
PS C:\wse_github\machine-learning\environments\windows\src\google-gemma> dir
    Directory: C:\wse_github\machine-learning\environments\windows\src\google-gemma
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2/25/2024  10:04 PM            861 gemma-gpu.py
Screenshot 2024-02-25 at 17 08 05
PS C:\Users\michael> nvidia-smi
Sun Feb 25 22:09:50 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 536.25                 Driver Version: 536.25       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                    WDDM  | 00000000:00:03.0 Off |                    0 |
| N/A   64C    P0              38W /  72W |  19706MiB / 23034MiB |     62%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                    WDDM  | 00000000:00:04.0 Off |                    0 |
| N/A   62C    P0              37W /  72W |  16312MiB / 23034MiB |     70%      Default |
|                                         |                      |                  N/A |
Screenshot 2024-02-25 at 17 11 31

results: 3:22 at 50% GPU saturation

genarate start:  22:07:46
C:\Program Files\Python310\lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
end 22:11:08

2nd run
0:60/1:70% 
genarate start:  22:21:05
C:\Program Files\Python310\lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
end 22:24:33

gemma 2B - 2 L4 on GCP

Screenshot 2024-02-25 at 17 29 31
PS C:\Users\michael> nvidia-smi
Sun Feb 25 22:27:40 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 536.25                 Driver Version: 536.25       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                    WDDM  | 00000000:00:03.0 Off |                    0 |
| N/A   65C    P0              38W /  72W |   7661MiB / 23034MiB |     67%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                    WDDM  | 00000000:00:04.0 Off |                    0 |
| N/A   61C    P0              34W /  72W |   5114MiB / 23034MiB |     68%      Default |
|                                         |                      |                  N/A |

genarate start:  22:27:26
C:\Program Files\Python310\lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 22:31:23

however running without NVidia grid as below

gcloud compute instances create nvidia-rtx-virtual-workstation-window-7-vm-20240225-215824 --project=cuda-old --zone=us-east4-a --machine-type=g2-standard-24 --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default --metadata=^,@^google-monitoring-enable=0,@google-logging-enable=0,@windows-keys=\{\"expireOn\":\"2023-08-12T00:35:23.193242Z\",\"userName\":\"michael\",\"email\":\"mic...ienlabs.dev\",\"modulus\":\"k7R8sljAONIAZoMUuQ\+/KR7\+BH03q52QYhYT8yDWM4tAcveUC\+xjPhQ/LRhQG1GPY/yIOXp1zWKF7V87v0Ffi1xTUghkctLXXRRuqUjqC3L2JSuB7eHYijDfk5XUkaIoZq\+VMjHRBo7bw2dq3JSs0Czfv/BhNzGPrd0tI/UoBIFt7CZ3oxwqC5b5w0NAL9NdqD1LkEmqN56aMbVd9f9rnmEFlENySRbZXIeq61MT9qnDkfMm6Iq0eMY3g8vBYSplYGxbCxETOIvAU/5uh5gkjupX9A01O9DtJpTHoN98X6QHtED8xgYrwneMbtRvwgdRjNzFnH5mL4j95ZZprrEe3Q==\",\"exponent\":\"AQAB\"\} --maintenance-policy=TERMINATE --provisioning-model=STANDARD --service-account=196717963363-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/devstorage.read_only --accelerator=count=2,type=nvidia-l4-vws --tags=nvidia-rtx-virtual-workstation-window-7-deployment --create-disk=auto-delete=yes,boot=yes,device-name=autogen-vm-tmpl-boot-disk,image=projects/nvidia-vgpu-public/global/images/nv-windows-server-2022-vws-536-25-v202306270722,mode=rw,size=50,type=projects/cuda-old/zones/us-east4-a/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=goog-dm=nvidia-rtx-virtual-workstation-window-7,goog-ec-src=vm_add-gcloud --reservation-affinity=any

@obriensystems
Copy link
Member Author

Single L4

PS C:\wse_github\machine-learning\environments\windows\src\google-gemma> $Env:CUDA_VISIBLE_DEVICES = 0
PS C:\wse_github\machine-learning\environments\windows\src\google-gemma> python gemma-gpu.py

Screenshot 2024-02-25 at 17 52 03

faster: 0:24 for the gemma 2B model that fits in 25G

genarate start:  22:51:35
C:\Program Files\Python310\lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 22:51:59

PS C:\Users\michael> nvidia-smi
Sun Feb 25 22:51:54 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 536.25                 Driver Version: 536.25       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                    WDDM  | 00000000:00:03.0 Off |                    0 |
| N/A   78C    P0              72W /  72W |  12725MiB / 23034MiB |     91%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                    WDDM  | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P8              12W /  72W |      0MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |

@obriensystems
Copy link
Member Author

selecting devices in code

import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

@obriensystems
Copy link
Member Author

RTX-4090 single Gemma 2B

$ python gemma-gpu.py
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:02<00:00,  1.04s/it]
genarate start:  23:33:15
C:\Users\michael\AppData\Roaming\Python\Python311\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 23:33:23

RTX-A4500 single Gemma 2B


michael@13900d MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.34s/it]
genarate start:  23:36:11
C:\opt\Python310\lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, we need to understand what the beta and r process are. The beta process is a type of nuclear reaction that occurs in stars when a neutron is converted into a proton, releasing a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. The r process is a type of nuclear reaction that occurs in supernovae and neutron stars. It involves the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of elements heavier than iron that are not produced by the beta process. Now, let's consider how gold is made in these processes. In the beta process, gold is produced by the conversion of a neutron into a proton, followed by the emission of a positron and an electron neutrino. This process is responsible for the production of most of the elements heavier than iron in the universe. However, the r process is responsible for the production of gold in particular. In supernovae and neutron stars, gold is produced by the capture of a neutron by a nucleus, followed by the emission of a proton and an electron neutrino. This process is responsible for the production of gold in the universe.

Step 2/2
Therefore, the ratio of gold created during the beta and r process depends on the ratio of the number of neutrons to protons in the star. If the ratio is high, more gold will be produced by the beta process, and if the ratio is low, more gold will be produced by the r process. However, the exact ratio of gold created by these processes is not known, as it depends on the specific conditions of the star and the supernova or neutron star.<eos>
end 23:36:21

image

@obriensystems
Copy link
Member Author

obriensystems commented Feb 27, 2024

No H100, A100 80/40 but V100 with 32G are available in amsterdam

V100 16G

image

$2.11 hourly

gcloud compute instances create instance-20240227-021806 --project=cuda-old --zone=europe-west4-a --machine-type=n1-standard-8 --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default --maintenance-policy=TERMINATE --provisioning-model=STANDARD --service-account=196717963363-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/cloud-platform --accelerator=count=1,type=nvidia-tesla-v100 --tags=http-server,https-server --create-disk=auto-delete=yes,boot=yes,device-name=instance-20240227-021806,image=projects/ml-images/global/images/c0-deeplearning-common-cu113-v20230925-debian-10,mode=rw,size=200,type=projects/cuda-old/zones/europe-west4-a/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=goog-ec-src=vm_add-gcloud --reservation-affinity=any

image

image

image

======================================
Welcome to the Google Deep Learning VM
======================================

Version: common-cu113.m112
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-25-cloud-amd64 x86_64\n)

Resources:
 * Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
 * Google Cloud Documentation: https://cloud.google.com/deep-learning-vm
 * Google Group: https://groups.google.com/forum/#!forum/google-dl-platform

To reinstall Nvidia driver (if needed) run:
sudo /opt/deeplearning/install-driver.sh
Linux instance-20240227-v100cuda32 4.19.0-25-cloud-amd64 #1 SMP Debian 4.19.289-2 (2023-08-08) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

This VM requires Nvidia drivers to function correctly.   Installation takes ~1 minute.
Would you like to install the Nvidia driver? [y/n] y
Waiting for security updates to finish...-Installing Nvidia driver.
+ main
+ wait_apt_locks_released
+ echo 'wait apt locks released'
wait apt locks released
+ sudo fuser /var/lib/dpkg/lock /var/lib/apt/lists/lock /var/cache/apt/archives/lock
+ sudo fuser /var/lib/dpkg/lock-frontend
+ install_linux_headers
++ uname -r
+ echo 'install linux headers: linux-headers-4.19.0-25-cloud-amd64'
install linux headers: linux-headers-4.19.0-25-cloud-amd64
++ uname -r
+ sudo apt-get -o DPkg::Lock::Timeout=120 install -y linux-headers-4.19.0-25-cloud-amd64
Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-headers-4.19.0-25-cloud-amd64 is already the newest version (4.19.289-2).
0 upgraded, 0 newly installed, 0 to remove and 11 not upgraded.
+ source /opt/deeplearning/driver-version.sh
++ export DRIVER_VERSION=510.47.03
++ DRIVER_VERSION=510.47.03
++ export DRIVER_UBUNTU_DEB=nvidia-driver-local-repo-ubuntu1804-510.47.03_1.0-1_amd64.deb
++ DRIVER_UBUNTU_DEB=nvidia-driver-local-repo-ubuntu1804-510.47.03_1.0-1_amd64.deb
++ export DRIVER_UBUNTU_CUDA_VERSION=11.3.1
++ DRIVER_UBUNTU_CUDA_VERSION=11.3.1
++ export DRIVER_UBUNTU_PKG=nvidia-driver-510
++ DRIVER_UBUNTU_PKG=nvidia-driver-510
+ export DRIVER_GCS_PATH
++ get_attribute_value nvidia-driver-gcs-path
++ get_metadata_value instance/attributes/nvidia-driver-gcs-path
++ curl --retry 5 -s -f -H 'Metadata-Flavor: Google' http://metadata/computeMetadata/v1/instance/attributes/nvidia-driver-gcs-path
+ DRIVER_GCS_PATH=
+ install_nvidia_linux_drivers
+ echo 'DRIVER_VERSION: 510.47.03'
DRIVER_VERSION: 510.47.03
+ local driver_installer_file_name=driver_installer.run
+ local nvidia_driver_file_name=NVIDIA-Linux-x86_64-510.47.03.run
+ custom_driver=false
+ local driver_gcs_file_path
+ [[ -z '' ]]
+ DRIVER_GCS_PATH=gs://nvidia-drivers-us-public/tesla/510.47.03
+ driver_gcs_file_path=gs://nvidia-drivers-us-public/tesla/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run
+ echo 'Downloading driver from GCS location and install: gs://nvidia-drivers-us-public/tesla/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run'
Downloading driver from GCS location and install: gs://nvidia-drivers-us-public/tesla/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run
+ set +e
+ gsutil -q cp gs://nvidia-drivers-us-public/tesla/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run driver_installer.run
+ set -e
+ [[ ! -f driver_installer.run ]]
+ [[ ! -f driver_installer.run ]]
+ local open_kernel_module_arg=-m=kernel-open
+ IFS=.
+ read -r major minor patch
++ get_metadata_value instance/machine-type
++ curl --retry 5 -s -f -H 'Metadata-Flavor: Google' http://metadata/computeMetadata/v1/instance/machine-type
+ local -r machine_type_full=projects/196717963363/machineTypes/n1-standard-8
+ local machine_type=n1-standard-8
+ [[ 510 -lt 525 ]]
+ open_kernel_module_arg=
+ chmod +x driver_installer.run
+ sudo ./driver_installer.run --dkms -a -s --no-drm --install-libglvnd ''
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 510.47.03..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this
         installation of the NVIDIA driver.


WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path
         '/usr/lib64/xorg/modules'; these paths were not queryable from the system.  If X fails to find the
         NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package
         for your distribution and reinstall the driver.

+ rm -rf driver_installer.run
+ exit 0
Nvidia driver installed.

The V100 is only 16G - less than the L4 at 24G - not the expected 32G
but it is 300W active cooled - instead of 75W passive cooled

(base) michael@instance-20240227-v100cuda32:~$ nvidia-smi
Tue Feb 27 02:34:23 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    38W / 300W |      0MiB / 16384MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(base) michael@instance-20240227-v100cuda32:~$ python --version
Python 3.7.12

install libraries

Run Google Gemma 2B from the Hugging Face repo

First clone this repo
https://github.com/ObrienlabsDev/machine-learning.git

     nvidia-smi
     pip install -U transformers
     pip install -U torch
     pip install accelerate
     git clone https://github.com/ObrienlabsDev/machine-learning.git
     cd machine-learning/
     cd environments/windows/src/google-gemma/

image

Fix the hugging face token first - use yours
image

or add (did not need on windows L4 image - just linux V100

from huggingface_hub import login
login()

and login on the fly
image

(base) michael@instance-20240227-v100cuda32:~/machine-learning/environments/windows/src/google-gemma$ python gemma-gpu.py 

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /home/michael/.cache/huggingface/token
Login successful
Traceback (most recent call last):
  File "gemma-gpu.py", line 18, in <module>
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")#, token=access_token)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 689, in from_pretrained
    f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.


use interpreter
base) michael@instance-20240227-v100cuda32:~/machine-learning/environments/windows/src/google-gemma$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) 
>>> from huggingface_hub import login
>>> login()

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /home/michael/.cache/huggingface/token
Login successful
>>> 

Linux specific - looks like transformers needs to be updated - post hugging face login

to fix

(base) michael@instance-20240227-v100cuda32:~/machine-learning/environments/windows/src/google-gemma$ python gemma-gpu.py 
Traceback (most recent call last):
  File "gemma-gpu.py", line 18, in <module>
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", token=access_token)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 689, in from_pretrained
    f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.
pip install -U transformers

does not fix

switching to use_auth_token=True

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", use_auth_token=True) #token=access_token)
# GPU
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", use_auth_token=True) #token=access_token)


trying from (16h ago)

!pip -q install git+https://github.com/huggingface/transformers.git

(base) michael@instance-20240227-v100cuda32:~/machine-learning/environments/windows/src/google-gemma$ !pip -q install git+https://github.com/huggingface/transformers.git
pip install -U transformers -q install git+https://github.com/huggingface/transformers.git
ERROR: Ignored the following versions that require a different python version: 0.17.0 Requires-Python >=3.8.0; 0.17.0rc0 Requires-Python >=3.8.0; 0.17.1 Requires-Python >=3.8.0; 0.17.2 Requires-Python >=3.8.0; 0.17.3 Requires-Python >=3.8.0; 0.18.0 Requires-Python >=3.8.0; 0.18.0rc0 Requires-Python >=3.8.0; 0.19.0 Requires-Python >=3.8.0; 0.19.0rc0 Requires-Python >=3.8.0; 0.19.1 Requires-Python >=3.8.0; 0.19.2 Requires-Python >=3.8.0; 0.19.3 Requires-Python >=3.8.0; 0.19.4 Requires-Python >=3.8.0; 0.20.0 Requires-Python >=3.8.0; 0.20.0rc0 Requires-Python >=3.8.0; 0.20.0rc1 Requires-Python >=3.8.0; 0.20.1 Requires-Python >=3.8.0; 0.20.2 Requires-Python >=3.8.0; 0.20.3 Requires-Python >=3.8.0; 4.31.0 Requires-Python >=3.8.0; 4.32.0 Requires-Python >=3.8.0; 4.32.1 Requires-Python >=3.8.0; 4.33.0 Requires-Python >=3.8.0; 4.33.1 Requires-Python >=3.8.0; 4.33.2 Requires-Python >=3.8.0; 4.33.3 Requires-Python >=3.8.0; 4.34.0 Requires-Python >=3.8.0; 4.34.1 Requires-Python >=3.8.0; 4.35.0 Requires-Python >=3.8.0; 4.35.1 Requires-Python >=3.8.0; 4.35.2 Requires-Python >=3.8.0; 4.36.0 Requires-Python >=3.8.0; 4.36.1 Requires-Python >=3.8.0; 4.36.2 Requires-Python >=3.8.0; 4.37.0 Requires-Python >=3.8.0; 4.37.1 Requires-Python >=3.8.0; 4.37.2 Requires-Python >=3.8.0; 4.38.0 Requires-Python >=3.8.0; 4.38.1 Requires-Python >=3.8.0
ERROR: Could not find a version that satisfies the requirement huggingface-hub<1.0,>=0.19.3 (from transformers) (from versions: 0.0.1, 0.0.2, 0.0.3rc1, 0.0.3rc2, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.0.10, 0.0.11, 0.0.12, 0.0.13, 0.0.14, 0.0.15, 0.0.16, 0.0.17, 0.0.18, 0.0.19, 0.1.0, 0.1.1, 0.1.2, 0.2.0, 0.2.1, 0.4.0, 0.5.0, 0.5.1, 0.6.0rc0, 0.6.0, 0.7.0rc0, 0.7.0, 0.8.0rc0, 0.8.0rc1, 0.8.0rc2, 0.8.0rc3, 0.8.0rc4, 0.8.0, 0.8.1, 0.9.0.dev0, 0.9.0rc0, 0.9.0rc2, 0.9.0rc3, 0.9.0, 0.9.1, 0.10.0rc0, 0.10.0rc1, 0.10.0rc3, 0.10.0, 0.10.1, 0.11.0rc0, 0.11.0rc1, 0.11.0, 0.11.1, 0.12.0rc0, 0.12.0, 0.12.1, 0.13.0rc0, 0.13.0rc1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.13.4, 0.14.0rc0, 0.14.0rc1, 0.14.0, 0.14.1, 0.15.0rc0, 0.15.0, 0.15.1, 0.16.0rc0, 0.16.1, 0.16.2, 0.16.3, 0.16.4)
ERROR: No matching distribution found for huggingface-hub<1.0,>=0.19.3

Issue is I need python 3.11 - running 3.7

(base) michael@instance-20240227-v100cuda32:~/machine-learning/environments/windows/src/google-gemma$ python --version
Python 3.7.12

installing python 3.12

(base) michael@instance-20240227-v100cuda32:~/machine-learning/environments/windows/src/google-gemma$ sudo apt install python 3.12

(base) michael@instance-20240227-v100cuda32:~/machine-learning/environments/windows/src/google-gemma$ sudo apt-get install --only-upgrade python3
Reading package lists... Done
Building dependency tree       
Reading state information... Done
python3 is already the newest version (3.7.3-1).
0 upgraded, 0 newly installed, 0 to remove and 12 not upgraded.


Need Python 3.11 - switching to later image

image

 Deep Learning VM with CUDA 12.1 M116

Debian 11, Python 3.10. With CUDA 12.1 preinstalled.

gcloud compute instances create instance-2024022-v100b16 --project=cuda-old --zone=europe-west4-a --machine-type=n1-standard-8 --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default --maintenance-policy=TERMINATE --provisioning-model=STANDARD --service-account=196717963363-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/cloud-platform --accelerator=count=1,type=nvidia-tesla-v100 --tags=http-server,https-server --create-disk=auto-delete=yes,boot=yes,device-name=instance-2024022-v100b16,image=projects/ml-images/global/images/c0-deeplearning-common-cu121-v20240128-debian-11-py310,mode=rw,size=200,type=projects/cuda-old/zones/europe-west4-a/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=goog-ec-src=vm_add-gcloud --reservation-affinity=any

we will see later if we are OK with 3.10 and not 3.11
(base) michael@instance-2024022-v100b16:~$ python3 --version
Python 3.10.13

@obriensystems
Copy link
Member Author

@obriensystems
Copy link
Member Author

obriensystems commented Mar 27, 2024

Google Gemma 7B on RTX-A6000

Screenshot 2024-04-20 at 13 34 43

image

image

image

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning (main)
$ ./build.sh
2024/03/26 22:30:47 http2: server: error reading preface from client //./pipe/docker_engine: file has already been closed
#0 building with "default" instance using docker driver

#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 485B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/tensorflow/tensorflow:latest-gpu
#3 DONE 0.5s

#4 [1/3] FROM docker.io/tensorflow/tensorflow:latest-gpu@sha256:4ab9ffddd6ffacc9251ac6439f431eb38d66200d3f52397b5d77f9bc3298c4e9
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 57B done
#5 DONE 0.0s

#6 [2/3] WORKDIR /src
#6 CACHED

#7 [3/3] COPY /src/tflow.py .
#7 CACHED

#8 exporting to image
#8 exporting layers done
#8 writing image sha256:8ed644da2ebba91f78a1a769325adc43c153536365b2aa857da1a7628136faeb done
#8 naming to docker.io/library/ml-tensorflow-win done
#8 DONE 0.0s

What's Next?
  1. Sign in to your Docker account → docker login
  2. View a summary of image vulnerabilities and recommendations → docker scout quickview
2024-03-27 02:30:48.886114: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-27 02:30:48.920618: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-27 02:30:49.840948: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:49.844225: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:49.844270: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:49.848333: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:49.848358: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:49.848366: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:50.770778: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:50.770819: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:50.770825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2019] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2024-03-27 02:30:50.770944: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-27 02:30:50.771087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 45757 MB memory:  -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:01:00.0, compute capability: 8.6
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
169001437/169001437 ━━━━━━━━━━━━━━━━━━━━ 4s 0us/step
Epoch 1/50
2024-03-27 02:31:14.860698: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8906
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 403ms/step - accuracy: 0.0193 - loss: 5.92562024-03-27 02:31:28.659203: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:31:28.659288: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 33s 628ms/step - accuracy: 0.0205 - loss: 5.8367
Epoch 2/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 403ms/step - accuracy: 0.0669 - loss: 4.20572024-03-27 02:31:36.654512: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:31:36.654549: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 382ms/step - accuracy: 0.0686 - loss: 4.1907
Epoch 3/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 405ms/step - accuracy: 0.1232 - loss: 3.81262024-03-27 02:31:41.708714: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:31:41.708885: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 382ms/step - accuracy: 0.1249 - loss: 3.8003
Epoch 4/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 413ms/step - accuracy: 0.1859 - loss: 3.43082024-03-27 02:31:46.860052: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:31:46.860109: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 393ms/step - accuracy: 0.1870 - loss: 3.4237
Epoch 5/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 407ms/step - accuracy: 0.2577 - loss: 3.06172024-03-27 02:31:51.956526: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:31:51.956592: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 385ms/step - accuracy: 0.2583 - loss: 3.0574
Epoch 6/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 408ms/step - accuracy: 0.3328 - loss: 2.66182024-03-27 02:31:57.061256: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:31:57.061301: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 386ms/step - accuracy: 0.3333 - loss: 2.6603
Epoch 7/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 405ms/step - accuracy: 0.4023 - loss: 2.32382024-03-27 02:32:02.117569: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:02.117644: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 385ms/step - accuracy: 0.4027 - loss: 2.3221
Epoch 8/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 405ms/step - accuracy: 0.4754 - loss: 2.00592024-03-27 02:32:07.214123: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:07.214181: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 385ms/step - accuracy: 0.4749 - loss: 2.0077
Epoch 9/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 408ms/step - accuracy: 0.5448 - loss: 1.71792024-03-27 02:32:12.314759: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:12.314809: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 388ms/step - accuracy: 0.5432 - loss: 1.7226
Epoch 10/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 406ms/step - accuracy: 0.5986 - loss: 1.50712024-03-27 02:32:17.391075: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:17.391118: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 384ms/step - accuracy: 0.5971 - loss: 1.5099
Epoch 11/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 412ms/step - accuracy: 0.6382 - loss: 1.30302024-03-27 02:32:22.558514: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:22.558562: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 389ms/step - accuracy: 0.6373 - loss: 1.3070
Epoch 12/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 407ms/step - accuracy: 0.6792 - loss: 1.16252024-03-27 02:32:27.620701: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:27.620755: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 385ms/step - accuracy: 0.6771 - loss: 1.1689
Epoch 13/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 410ms/step - accuracy: 0.7221 - loss: 0.99232024-03-27 02:32:32.731691: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:32.731732: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 387ms/step - accuracy: 0.7202 - loss: 0.9970
Epoch 14/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 410ms/step - accuracy: 0.7718 - loss: 0.80202024-03-27 02:32:37.839133: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:37.839181: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 387ms/step - accuracy: 0.7716 - loss: 0.7997
Epoch 15/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 409ms/step - accuracy: 0.8208 - loss: 0.62872024-03-27 02:32:42.915554: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:42.915601: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 387ms/step - accuracy: 0.8197 - loss: 0.6313
Epoch 16/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 408ms/step - accuracy: 0.8508 - loss: 0.51042024-03-27 02:32:48.022487: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:48.022564: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 386ms/step - accuracy: 0.8493 - loss: 0.5163
Epoch 17/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 410ms/step - accuracy: 0.8664 - loss: 0.45142024-03-27 02:32:53.150208: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:53.150257: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 388ms/step - accuracy: 0.8657 - loss: 0.4540
Epoch 18/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 414ms/step - accuracy: 0.8464 - loss: 0.59282024-03-27 02:32:58.335544: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:32:58.335582: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 391ms/step - accuracy: 0.8441 - loss: 0.5975
Epoch 19/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 414ms/step - accuracy: 0.8598 - loss: 0.48782024-03-27 02:33:03.497457: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:03.497506: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 391ms/step - accuracy: 0.8591 - loss: 0.4901
Epoch 20/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 414ms/step - accuracy: 0.9000 - loss: 0.34322024-03-27 02:33:08.674424: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:08.674492: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 392ms/step - accuracy: 0.8992 - loss: 0.3495
Epoch 21/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 415ms/step - accuracy: 0.8625 - loss: 0.44792024-03-27 02:33:13.844758: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:13.844808: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 391ms/step - accuracy: 0.8596 - loss: 0.4574
Epoch 22/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 415ms/step - accuracy: 0.8769 - loss: 0.41382024-03-27 02:33:18.990575: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:18.990633: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 391ms/step - accuracy: 0.8764 - loss: 0.4148
Epoch 23/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 412ms/step - accuracy: 0.9143 - loss: 0.29442024-03-27 02:33:24.094218: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:24.094496: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 390ms/step - accuracy: 0.9138 - loss: 0.2950
Epoch 24/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 411ms/step - accuracy: 0.9403 - loss: 0.21402024-03-27 02:33:34.650706: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:34.650759: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 11s 393ms/step - accuracy: 0.9399 - loss: 0.2150
Epoch 25/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 409ms/step - accuracy: 0.9603 - loss: 0.14942024-03-27 02:33:39.821568: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:39.821618: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 387ms/step - accuracy: 0.9599 - loss: 0.1508
Epoch 26/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 408ms/step - accuracy: 0.9683 - loss: 0.12572024-03-27 02:33:44.905229: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:44.905540: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 387ms/step - accuracy: 0.9678 - loss: 0.1270
Epoch 27/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 411ms/step - accuracy: 0.9456 - loss: 0.19372024-03-27 02:33:50.058238: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:50.058289: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 389ms/step - accuracy: 0.9443 - loss: 0.1976
Epoch 28/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 413ms/step - accuracy: 0.9503 - loss: 0.16942024-03-27 02:33:55.204099: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:33:55.204145: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 390ms/step - accuracy: 0.9498 - loss: 0.1709
Epoch 29/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 408ms/step - accuracy: 0.9561 - loss: 0.14962024-03-27 02:34:00.296730: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:00.296778: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 387ms/step - accuracy: 0.9555 - loss: 0.1513
Epoch 30/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 407ms/step - accuracy: 0.9656 - loss: 0.11772024-03-27 02:34:05.420959: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:05.421080: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 388ms/step - accuracy: 0.9649 - loss: 0.1194
Epoch 31/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 408ms/step - accuracy: 0.9681 - loss: 0.10822024-03-27 02:34:10.570976: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:10.571048: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 385ms/step - accuracy: 0.9677 - loss: 0.1115
Epoch 32/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 412ms/step - accuracy: 0.9665 - loss: 0.11612024-03-27 02:34:15.688882: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:15.688937: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 389ms/step - accuracy: 0.9658 - loss: 0.1191
Epoch 33/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 412ms/step - accuracy: 0.9611 - loss: 0.13662024-03-27 02:34:20.806693: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:20.806731: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 389ms/step - accuracy: 0.9607 - loss: 0.1399
Epoch 34/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 414ms/step - accuracy: 0.9641 - loss: 0.13682024-03-27 02:34:25.956910: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:25.956985: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 390ms/step - accuracy: 0.9637 - loss: 0.1377
Epoch 35/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 412ms/step - accuracy: 0.9665 - loss: 0.13292024-03-27 02:34:31.097990: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:31.098036: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 391ms/step - accuracy: 0.9657 - loss: 0.1384
Epoch 36/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 410ms/step - accuracy: 0.9491 - loss: 0.16552024-03-27 02:34:36.234712: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:36.234762: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 391ms/step - accuracy: 0.9484 - loss: 0.1721
Epoch 37/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 413ms/step - accuracy: 0.9488 - loss: 0.19262024-03-27 02:34:41.419997: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:41.420053: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 391ms/step - accuracy: 0.9479 - loss: 0.1954
Epoch 38/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 410ms/step - accuracy: 0.9291 - loss: 0.23512024-03-27 02:34:46.521615: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:46.521659: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 387ms/step - accuracy: 0.9258 - loss: 0.2473
Epoch 39/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 414ms/step - accuracy: 0.8914 - loss: 0.35372024-03-27 02:34:51.658518: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:51.658567: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 395ms/step - accuracy: 0.8907 - loss: 0.3609
Epoch 40/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 414ms/step - accuracy: 0.9146 - loss: 0.27362024-03-27 02:34:56.856334: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:34:56.856662: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 390ms/step - accuracy: 0.9145 - loss: 0.2747
Epoch 41/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 410ms/step - accuracy: 0.9365 - loss: 0.20332024-03-27 02:35:01.959477: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:01.959830: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 388ms/step - accuracy: 0.9363 - loss: 0.2042
Epoch 42/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 411ms/step - accuracy: 0.9585 - loss: 0.14502024-03-27 02:35:07.080055: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:07.080126: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 389ms/step - accuracy: 0.9583 - loss: 0.1474
Epoch 43/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 413ms/step - accuracy: 0.9714 - loss: 0.10292024-03-27 02:35:12.254993: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:12.255123: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 390ms/step - accuracy: 0.9711 - loss: 0.1036
Epoch 44/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 412ms/step - accuracy: 0.9811 - loss: 0.09602024-03-27 02:35:17.437499: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:17.437552: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 389ms/step - accuracy: 0.9809 - loss: 0.0956
Epoch 45/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 419ms/step - accuracy: 0.9842 - loss: 0.05802024-03-27 02:35:22.730836: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:22.730869: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 397ms/step - accuracy: 0.9840 - loss: 0.0587
Epoch 46/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 415ms/step - accuracy: 0.9890 - loss: 0.04022024-03-27 02:35:27.916598: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:27.916671: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 394ms/step - accuracy: 0.9889 - loss: 0.0407
Epoch 47/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 415ms/step - accuracy: 0.9934 - loss: 0.03182024-03-27 02:35:33.106803: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:33.106851: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 394ms/step - accuracy: 0.9933 - loss: 0.0351
Epoch 48/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 408ms/step - accuracy: 0.9959 - loss: 0.02112024-03-27 02:35:38.235103: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:38.235171: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 385ms/step - accuracy: 0.9958 - loss: 0.0213
Epoch 49/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 412ms/step - accuracy: 0.9967 - loss: 0.01532024-03-27 02:35:43.396051: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:43.396157: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 388ms/step - accuracy: 0.9967 - loss: 0.0155
Epoch 50/50
12/13 ━━━━━━━━━━━━━━━━━━━━ 0s 415ms/step - accuracy: 0.9960 - loss: 0.03992024-03-27 02:35:48.538435: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
2024-03-27 02:35:48.538483: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
         [[{{node MultiDeviceIteratorGetNextFromShard}}]]
         [[RemoteCall]]
13/13 ━━━━━━━━━━━━━━━━━━━━ 5s 392ms/step - accuracy: 0.9958 - loss: 0.0388

thermal throttling

@obriensystems
Copy link
Member Author

image

idle
image

full power
image

image

image

capacitor squeak 1 sec at 49/50 epoch 2048 batch

@obriensystems
Copy link
Member Author

Gemma 7b on A6000 32G of 48G
image

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
tokenizer_config.json: 100%|##########| 1.11k/1.11k [00:00<?, ?B/s]
C:\optpython312\Lib\site-packages\huggingface_hub\file_download.py:149: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\michael\.cache\huggingface\hub\models--google--gemma-7b. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
tokenizer.model: 100%|##########| 4.24M/4.24M [00:00<00:00, 29.2MB/s]
tokenizer.json: 100%|##########| 17.5M/17.5M [00:00<00:00, 87.2MB/s]
special_tokens_map.json: 100%|##########| 555/555 [00:00<?, ?B/s]
config.json: 100%|##########| 629/629 [00:00<?, ?B/s]
model.safetensors.index.json: 100%|##########| 20.9k/20.9k [00:00<00:00, 42.0MB/s]
model-00001-of-00004.safetensors: 100%|##########| 5.00G/5.00G [00:54<00:00, 91.2MB/s]
model-00002-of-00004.safetensors: 100%|##########| 4.98G/4.98G [00:55<00:00, 90.4MB/s]
model-00003-of-00004.safetensors: 100%|##########| 4.98G/4.98G [00:54<00:00, 90.8MB/s]
model-00004-of-00004.safetensors: 100%|##########| 2.11G/2.11G [00:23<00:00, 89.8MB/s]
Downloading shards: 100%|##########| 4/4 [03:08<00:00, 47.22s/it]0:23<00:00, 87.6MB/s]
Loading checkpoint shards: 100%|##########| 4/4 [00:10<00:00,  2.57s/it]
generation_config.json: 100%|##########| 137/137 [00:00<00:00, 275kB/s]
C:\optpython312\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
genarate start:  22:53:09
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
end 22:53:24

36G vram

michael@14900c MINGW64 /c/wse_github/ObrienlabsDev/machine-learning/environments/windows/src/google-gemma (main)
$ nvidia-smi
Tue Mar 26 23:00:04 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.99                 Driver Version: 537.99       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000             WDDM  | 00000000:01:00.0 Off |                  Off |
| 30%   42C    P8               6W / 300W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+


@obriensystems
Copy link
Member Author

obriensystems commented Apr 20, 2024

Rerun A6000 48G VRAM on rebuilt machine

michael@14900c MINGW64 ~
$ cd /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma/

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python --version
Python 3.12.3

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ cat gemma-gpu.py
import os
# default dual GPU - either PCIe bus or NVidia bus - slowdowns
#os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
# specific GPU - model must fit entierely in memory RTX-3500 ada = 12G, A4000=16G, A4500=20, A6000=48, 4000 ada = 20, 5000 ada = 32, 6000 ada = 48
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from transformers import AutoTokenizer, AutoModelForCausalLM
from datetime import datetime

access_token='hf_cfTP...QqH'



tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b", token=access_token)
# GPU
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto", token=access_token)
# CPU
#model = AutoModelForCausalLM.from_pretrained("google/gemma-2b",token=access_token)

input_text = "how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process."
time_start = datetime.now().strftime("%H:%M:%S")
print("genarate start: ", datetime.now().strftime("%H:%M:%S"))

# GPU
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# CPU
#input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids,
                         max_new_tokens=10000)
print(tokenizer.decode(outputs[0]))

print("end", datetime.now().strftime("%H:%M:%S"))
time_end = datetime.now().strftime("%H:%M:%S")

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Traceback (most recent call last):
  File "C:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\google-gemma\gemma-gpu.py", line 7, in <module>
    from transformers import AutoTokenizer, AutoModelForCausalLM
ModuleNotFoundError: No module named 'transformers'

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ vi gemma-gpu.py

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ pip install -U torch
Collecting torch
  Downloading torch-2.2.2-cp312-cp312-win_amd64.whl.metadata (26 kB)
Collecting filelock (from torch)
  Downloading filelock-3.13.4-py3-none-any.whl.metadata (2.8 kB)
Collecting typing-extensions>=4.8.0 (from torch)
  Downloading typing_extensions-4.11.0-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy (from torch)
  Downloading sympy-1.12-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch)
  Downloading networkx-3.3-py3-none-any.whl.metadata (5.1 kB)
Collecting jinja2 (from torch)

  Downloading Jinja2-3.1.3-py3-none-any.whl.metadata (3.3 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2024.3.1-py3-none-any.whl.metadata (6.8 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl.metadata (3.1 kB)
Collecting mpmath>=0.19 (from sympy->torch)
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Downloading torch-2.2.2-cp312-cp312-win_amd64.whl (198.5 MB)
   --------------------------------------- 198.5/198.5 MB 46.7 MB/s eta 0:00:00
Downloading typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Downloading filelock-3.13.4-py3-none-any.whl (11 kB)
Downloading fsspec-2024.3.1-py3-none-any.whl (171 kB)
   ---------------------------------------- 172.0/172.0 kB ? eta 0:00:00
Downloading Jinja2-3.1.3-py3-none-any.whl (133 kB)
   ---------------------------------------- 133.2/133.2 kB 7.7 MB/s eta 0:00:00
Downloading networkx-3.3-py3-none-any.whl (1.7 MB)
   ---------------------------------------- 1.7/1.7 MB 105.8 MB/s eta 0:00:00
Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
   ---------------------------------------- 5.7/5.7 MB 122.0 MB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl (17 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
   ---------------------------------------- 536.2/536.2 kB ? eta 0:00:00
Installing collected packages: mpmath, typing-extensions, sympy, networkx, MarkupSafe, fsspec, filelock, jinja2, torch
Successfully installed MarkupSafe-2.1.5 filelock-3.13.4 fsspec-2024.3.1 jinja2-3.1.3 mpmath-1.3.0 networkx-3.3 sympy-1.12 torch-2.2.2 typing-extensions-4.11.0

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ pip install -U transformers
Collecting transformers
  Downloading transformers-4.40.0-py3-none-any.whl.metadata (137 kB)
     -------------------------------------- 137.6/137.6 kB 1.6 MB/s eta 0:00:00
Requirement already satisfied: filelock in c:\opt\python312\lib\site-packages (from transformers) (3.13.4)
Collecting huggingface-hub<1.0,>=0.19.3 (from transformers)
  Downloading huggingface_hub-0.22.2-py3-none-any.whl.metadata (12 kB)
Collecting numpy>=1.17 (from transformers)
  Downloading numpy-1.26.4-cp312-cp312-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 61.0/61.0 kB 3.2 MB/s eta 0:00:00
Collecting packaging>=20.0 (from transformers)
  Downloading packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting pyyaml>=5.1 (from transformers)
  Downloading PyYAML-6.0.1-cp312-cp312-win_amd64.whl.metadata (2.1 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2024.4.16-cp312-cp312-win_amd64.whl.metadata (41 kB)
     ---------------------------------------- 42.0/42.0 kB 2.1 MB/s eta 0:00:00
Collecting requests (from transformers)
  Downloading requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting tokenizers<0.20,>=0.19 (from transformers)
  Downloading tokenizers-0.19.1-cp312-none-win_amd64.whl.metadata (6.9 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.4.3-cp312-none-win_amd64.whl.metadata (3.9 kB)
Collecting tqdm>=4.27 (from transformers)
  Downloading tqdm-4.66.2-py3-none-any.whl.metadata (57 kB)
     ---------------------------------------- 57.6/57.6 kB 3.2 MB/s eta 0:00:00
Requirement already satisfied: fsspec>=2023.5.0 in c:\opt\python312\lib\site-packages (from huggingface-hub<1.0,>=0.19.3->transformers) (2024.3.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\opt\python312\lib\site-packages (from huggingface-hub<1.0,>=0.19.3->transformers) (4.11.0)
Collecting colorama (from tqdm>=4.27->transformers)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting charset-normalizer<4,>=2 (from requests->transformers)
  Downloading charset_normalizer-3.3.2-cp312-cp312-win_amd64.whl.metadata (34 kB)
Collecting idna<4,>=2.5 (from requests->transformers)
  Downloading idna-3.7-py3-none-any.whl.metadata (9.9 kB)
Collecting urllib3<3,>=1.21.1 (from requests->transformers)
  Downloading urllib3-2.2.1-py3-none-any.whl.metadata (6.4 kB)
Collecting certifi>=2017.4.17 (from requests->transformers)
  Downloading certifi-2024.2.2-py3-none-any.whl.metadata (2.2 kB)
Downloading transformers-4.40.0-py3-none-any.whl (9.0 MB)
   ---------------------------------------- 9.0/9.0 MB 16.9 MB/s eta 0:00:00
Downloading huggingface_hub-0.22.2-py3-none-any.whl (388 kB)
   --------------------------------------- 388.9/388.9 kB 25.2 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp312-cp312-win_amd64.whl (15.5 MB)
   ---------------------------------------- 15.5/15.5 MB 50.4 MB/s eta 0:00:00
Downloading packaging-24.0-py3-none-any.whl (53 kB)
   ---------------------------------------- 53.5/53.5 kB 2.7 MB/s eta 0:00:00
Downloading PyYAML-6.0.1-cp312-cp312-win_amd64.whl (138 kB)
   ---------------------------------------- 138.7/138.7 kB ? eta 0:00:00
Downloading regex-2024.4.16-cp312-cp312-win_amd64.whl (268 kB)
   --------------------------------------- 268.4/268.4 kB 17.2 MB/s eta 0:00:00
Downloading safetensors-0.4.3-cp312-none-win_amd64.whl (289 kB)
   --------------------------------------- 289.4/289.4 kB 18.6 MB/s eta 0:00:00
Downloading tokenizers-0.19.1-cp312-none-win_amd64.whl (2.2 MB)
   ---------------------------------------- 2.2/2.2 MB 71.1 MB/s eta 0:00:00
Downloading tqdm-4.66.2-py3-none-any.whl (78 kB)
   ---------------------------------------- 78.3/78.3 kB ? eta 0:00:00
Downloading requests-2.31.0-py3-none-any.whl (62 kB)
   ---------------------------------------- 62.6/62.6 kB ? eta 0:00:00
Downloading certifi-2024.2.2-py3-none-any.whl (163 kB)
   ---------------------------------------- 163.8/163.8 kB ? eta 0:00:00
Downloading charset_normalizer-3.3.2-cp312-cp312-win_amd64.whl (100 kB)
   ---------------------------------------- 100.4/100.4 kB ? eta 0:00:00
Downloading idna-3.7-py3-none-any.whl (66 kB)
   ---------------------------------------- 66.8/66.8 kB ? eta 0:00:00
Downloading urllib3-2.2.1-py3-none-any.whl (121 kB)
   ---------------------------------------- 121.1/121.1 kB 7.4 MB/s eta 0:00:00
Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: urllib3, safetensors, regex, pyyaml, packaging, numpy, idna, colorama, charset-normalizer, certifi, tqdm, requests, huggingface-hub, tokenizers, transformers
Successfully installed certifi-2024.2.2 charset-normalizer-3.3.2 colorama-0.4.6 huggingface-hub-0.22.2 idna-3.7 numpy-1.26.4 packaging-24.0 pyyaml-6.0.1 regex-2024.4.16 requests-2.31.0 safetensors-0.4.3 tokenizers-0.19.1 tqdm-4.66.2 transformers-4.40.0 urllib3-2.2.1
michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
C:\opt\Python312\Lib\site-packages\huggingface_hub\file_download.py:148: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\michael\.cache\huggingface\hub\models--google--gemma-7b. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Traceback (most recent call last):
  File "C:\wse_github\obrienlabsdev\machine-learning\environments\windows\src\google-gemma\gemma-gpu.py", line 16, in <module>
    model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto", token=access_token)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\opt\Python312\Lib\site-packages\transformers\models\auto\auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\opt\Python312\Lib\site-packages\transformers\modeling_utils.py", line 3086, in from_pretrained
    raise ImportError(
ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ sudo python gemma-gpu.py


I have not installed visual studio yet
image

https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development

forgot
michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ pip install accelerate
Collecting accelerate
  Downloading accelerate-0.29.3-py3-none-any.whl.metadata (18 kB)
Requirement already satisfied: numpy>=1.17 in c:\opt\python312\lib\site-packages (from accelerate) (1.26.4)
Requirement already satisfied: packaging>=20.0 in c:\opt\python312\lib\site-packages (from accelerate) (24.0)
Collecting psutil (from accelerate)
  Downloading psutil-5.9.8-cp37-abi3-win_amd64.whl.metadata (22 kB)
Requirement already satisfied: pyyaml in c:\opt\python312\lib\site-packages (from accelerate) (6.0.1)
Requirement already satisfied: torch>=1.10.0 in c:\opt\python312\lib\site-packages (from accelerate) (2.2.2)
Requirement already satisfied: huggingface-hub in c:\opt\python312\lib\site-packages (from accelerate) (0.22.2)
Requirement already satisfied: safetensors>=0.3.1 in c:\opt\python312\lib\site-packages (from accelerate) (0.4.3)
Requirement already satisfied: filelock in c:\opt\python312\lib\site-packages (from torch>=1.10.0->accelerate) (3.13.4)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\opt\python312\lib\site-packages (from torch>=1.10.0->accelerate) (4.11.0)
Requirement already satisfied: sympy in c:\opt\python312\lib\site-packages (from torch>=1.10.0->accelerate) (1.12)
Requirement already satisfied: networkx in c:\opt\python312\lib\site-packages (from torch>=1.10.0->accelerate) (3.3)
Requirement already satisfied: jinja2 in c:\opt\python312\lib\site-packages (from torch>=1.10.0->accelerate) (3.1.3)
Requirement already satisfied: fsspec in c:\opt\python312\lib\site-packages (from torch>=1.10.0->accelerate) (2024.3.1)
Requirement already satisfied: requests in c:\opt\python312\lib\site-packages (from huggingface-hub->accelerate) (2.31.0)
Requirement already satisfied: tqdm>=4.42.1 in c:\opt\python312\lib\site-packages (from huggingface-hub->accelerate) (4.66.2)
Requirement already satisfied: colorama in c:\opt\python312\lib\site-packages (from tqdm>=4.42.1->huggingface-hub->accelerate) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\opt\python312\lib\site-packages (from jinja2->torch>=1.10.0->accelerate) (2.1.5)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\opt\python312\lib\site-packages (from requests->huggingface-hub->accelerate) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\opt\python312\lib\site-packages (from requests->huggingface-hub->accelerate) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\opt\python312\lib\site-packages (from requests->huggingface-hub->accelerate) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\opt\python312\lib\site-packages (from requests->huggingface-hub->accelerate) (2024.2.2)
Requirement already satisfied: mpmath>=0.19 in c:\opt\python312\lib\site-packages (from sympy->torch>=1.10.0->accelerate) (1.3.0)
Downloading accelerate-0.29.3-py3-none-any.whl (297 kB)
   ---------------------------------------- 297.6/297.6 kB 3.7 MB/s eta 0:00:00
Downloading psutil-5.9.8-cp37-abi3-win_amd64.whl (255 kB)
   --------------------------------------- 255.1/255.1 kB 15.3 MB/s eta 0:00:00
Installing collected packages: psutil, accelerate
Successfully installed accelerate-0.29.3 psutil-5.9.8

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Looking in indexes: https://download.pytorch.org/whl/cu124
Requirement already satisfied: torch in c:\opt\python312\lib\site-packages (2.2.2)
ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none)
ERROR: No matching distribution found for torchvision

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Looking in indexes: https://download.pytorch.org/whl/cu121
Requirement already satisfied: torch in c:\opt\python312\lib\site-packages (2.2.2)
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.17.2%2Bcu121-cp312-cp312-win_amd64.whl (5.7 MB)
     ---------------------------------------- 5.7/5.7 MB 40.1 MB/s eta 0:00:00
Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.2.2%2Bcu121-cp312-cp312-win_amd64.whl (4.0 MB)
     ---------------------------------------- 4.0/4.0 MB 85.9 MB/s eta 0:00:00
Requirement already satisfied: filelock in c:\opt\python312\lib\site-packages (from torch) (3.13.4)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\opt\python312\lib\site-packages (from torch) (4.11.0)
Requirement already satisfied: sympy in c:\opt\python312\lib\site-packages (from torch) (1.12)
Requirement already satisfied: networkx in c:\opt\python312\lib\site-packages (from torch) (3.3)
Requirement already satisfied: jinja2 in c:\opt\python312\lib\site-packages (from torch) (3.1.3)
Requirement already satisfied: fsspec in c:\opt\python312\lib\site-packages (from torch) (2024.3.1)
Requirement already satisfied: numpy in c:\opt\python312\lib\site-packages (from torchvision) (1.26.4)
Collecting torch
  Downloading https://download.pytorch.org/whl/cu121/torch-2.2.2%2Bcu121-cp312-cp312-win_amd64.whl (2454.8 MB)
     ---------------------------------------- 2.5/2.5 GB 6.2 MB/s eta 0:00:00
Collecting pillow!=8.3.*,>=5.3.0 (from torchvision)
  Downloading https://download.pytorch.org/whl/pillow-10.2.0-cp312-cp312-win_amd64.whl (2.6 MB)
     ---------------------------------------- 2.6/2.6 MB 84.2 MB/s eta 0:00:00
Requirement already satisfied: MarkupSafe>=2.0 in c:\opt\python312\lib\site-packages (from jinja2->torch) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in c:\opt\python312\lib\site-packages (from sympy->torch) (1.3.0)
Installing collected packages: pillow, torch, torchvision, torchaudio
  Attempting uninstall: torch
    Found existing installation: torch 2.2.2
    Uninstalling torch-2.2.2:
      Successfully uninstalled torch-2.2.2
Successfully installed pillow-10.2.0 torch-2.2.2+cu121 torchaudio-2.2.2+cu121 torchvision-0.17.2+cu121




2115

image

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Downloading shards: 100%|##########| 4/4 [02:50<00:00, 42.56s/it]
Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 100%|##########| 4/4 [00:05<00:00,  1.26s/it]
C:\opt\Python312\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:575: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
genarate start:  21:18:23
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
end 21:18:36

michael@14900c MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 100%|##########| 4/4 [00:05<00:00,  1.28s/it]
C:\opt\Python312\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:575: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
genarate start:  21:21:54
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
end 21:22:06

16 sec
image

michael@14900c MINGW64 ~
$ nvidia-smi
Fri Apr 19 21:23:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.86                 Driver Version: 551.86         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000             WDDM  |   00000000:01:00.0 Off |                  Off |
| 38%   71C    P2            263W /  300W |   34075MiB /  49140MiB |     98%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1676    C+G   ...5n1h2txyewy\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A      4420    C+G   ...on\123.0.2420.97\msedgewebview2.exe      N/A      |
|    0   N/A  N/A      7844      C   C:\opt\Python312\python.exe                 N/A      |
|    0   N/A  N/A      8680    C+G   C:\Windows\explorer.exe                     N/A      |
|    0   N/A  N/A      9372    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A      9396    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A     10568    C+G   ...crosoft\Edge\Application\msedge.exe      N/A      |
|    0   N/A  N/A     11124    C+G   ...siveControlPanel\SystemSettings.exe      N/A      |
|    0   N/A  N/A     11820    C+G   ...cal\Microsoft\OneDrive\OneDrive.exe      N/A      |
|    0   N/A  N/A     12232    C+G   ...ekyb3d8bbwe\PhoneExperienceHost.exe      N/A      |
|    0   N/A  N/A     13936    C+G   ...oogle\Chrome\Application\chrome.exe      N/A      |
|    0   N/A  N/A     17968    C+G   ...\Docker\frontend\Docker Desktop.exe      N/A      |

@obriensystems
Copy link
Member Author

obriensystems commented Apr 20, 2024

gemma 7b on dual A4500

ichael@13900d MINGW64 ~
$ nvidia-smi
Sat Apr 20 01:12:02 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 546.12                 Driver Version: 546.12       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4500             WDDM  | 00000000:01:00.0 Off |                  Off |
| 34%   65C    P2              97W / 200W |  18597MiB / 20470MiB |     71%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A4500             WDDM  | 00000000:02:00.0 Off |                  Off |
| 30%   62C    P2              87W / 200W |  15537MiB / 20470MiB |     99%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     12596      C   C:\opt\Python310\python.exe               N/A      |
|    1   N/A  N/A     12596      C   C:\opt\Python310\python.exe               N/A      |
+---------------------------------------------------------------------------------------+


image

michael@13900d MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
tokenizer_config.json: 100%|███████████████████████████████████████████| 33.6k/33.6k [00:00<00:00, 9.59MB/s]
C:\opt\Python310\lib\site-packages\huggingface_hub\file_download.py:149: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\michael\.cache\huggingface\hub\models--google--gemma-7b. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
tokenizer.model: 100%|█████████████████████████████████████████████████| 4.24M/4.24M [00:00<00:00, 30.7MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████| 17.5M/17.5M [00:00<00:00, 111MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████| 636/636 [00:00<00:00, 1.28MB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████| 629/629 [00:00<?, ?B/s]
model.safetensors.index.json: 100%|████████████████████████████████████| 20.9k/20.9k [00:00<00:00, 20.9MB/s]
model-00001-of-00004.safetensors: 100%|█████████████████████████████████| 5.00G/5.00G [00:46<00:00, 108MB/s]
model-00002-of-00004.safetensors: 100%|█████████████████████████████████| 4.98G/4.98G [00:45<00:00, 110MB/s]
model-00003-of-00004.safetensors: 100%|█████████████████████████████████| 4.98G/4.98G [00:45<00:00, 109MB/s]
model-00004-of-00004.safetensors: 100%|█████████████████████████████████| 2.11G/2.11G [00:19<00:00, 109MB/s]
Downloading shards: 100%|█████████████████████████████████████████████████████| 4/4 [02:36<00:00, 39.23s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████| 4/4 [00:07<00:00,  1.95s/it]
generation_config.json: 100%|███████████████████████████████████████████████| 137/137 [00:00<00:00, 273kB/s]
genarate start:  01:03:20
C:\opt\Python310\lib\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
end 01:05:09

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from transformers import AutoTokenizer, AutoModelForCausalLM
from datetime import datetime

#access_token='hf_cfTP...XCQqH'
access_token='hf_cfTP....QqH'

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b", token=access_token)
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto", token=access_token)

input_text = "how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process."
time_start = datetime.now().strftime("%H:%M:%S")
print("genarate start: ", datetime.now().strftime("%H:%M:%S"))

input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, 
                         max_new_tokens=10000)
print(tokenizer.decode(outputs[0]))

print("end", datetime.now().strftime("%H:%M:%S"))
time_end = datetime.now().strftime("%H:%M:%S")

image

@obriensystems
Copy link
Member Author

gemma 7B on dual 4090

image

michael@13900b MINGW64 ~
$ nvidia-smi
Sat Apr 20 09:08:17 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 552.12                 Driver Version: 552.12         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090      WDDM  |   00000000:01:00.0 Off |                  Off |
|  0%   39C    P2            101W /  480W |   18832MiB /  24564MiB |     87%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090      WDDM  |   00000000:02:00.0  On |                  Off |
|  0%   48C    P2             99W /  480W |   16678MiB /  24564MiB |     97%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     25976      C   C:\opt\miniconda3\python.exe                N/A      |
|    1   N/A  N/A      1288    C+G   ...on\123.0.2420.97\msedgewebview2.exe      N/A      |
|    1   N/A  N/A      3568    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    1   N/A  N/A      7700    C+G   ...8bbwe\SnippingTool\SnippingTool.exe      N/A      |
|    1   N/A  N/A      8504    C+G   ...5n1h2txyewy\ShellExperienceHost.exe      N/A      |
|    1   N/A  N/A     12632    C+G   C:\Windows\explorer.exe                     N/A      |
|    1   N/A  N/A     12700    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
|    1   N/A  N/A     16244    C+G   ...US\ArmouryDevice\asus_framework.exe      N/A      |
|    1   N/A  N/A     17392    C+G   ...ekyb3d8bbwe\PhoneExperienceHost.exe      N/A      |
|    1   N/A  N/A     18056    C+G   ...e5b\Corsair iCUE5 Software\iCUE.exe      N/A      |
|    1   N/A  N/A     18392    C+G   ...GeForce Experience\NVIDIA Share.exe      N/A      |
|    1   N/A  N/A     18632    C+G   ...crosoft\Edge\Application\msedge.exe      N/A      |
|    1   N/A  N/A     18844    C+G   ...cal\Microsoft\OneDrive\OneDrive.exe      N/A      |
|    1   N/A  N/A     21116    C+G   ...on\123.0.2420.97\msedgewebview2.exe      N/A      |
|    1   N/A  N/A     22288    C+G   ....5435.0_x64__8j3eq9eme6ctt\IGCC.exe      N/A      |
|    1   N/A  N/A     23536    C+G   ...sair iCUE5 Software\QmlRenderer.exe      N/A      |
|    1   N/A  N/A     24584    C+G   ...siveControlPanel\SystemSettings.exe      N/A      |
|    1   N/A  N/A     24816    C+G   ...\Docker\frontend\Docker Desktop.exe      N/A      |
|    1   N/A  N/A     25976      C   C:\opt\miniconda3\python.exe                N/A      |
|    1   N/A  N/A     27624    C+G   C:\opt\vscode\Code.exe                      N/A      |

michael@13900b MINGW64 /c/wse_github/obrienlabsdev/machine-learning/environments/windows/src/google-gemma (main)
$ python gemma-gpu.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.60s/it]
genarate start:  09:07:39
C:\Users\michael\AppData\Roaming\Python\Python311\site-packages\transformers\models\gemma\modeling_gemma.py:555: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
<bos>how is gold made in collapsing neutron stars - specifically what is the ratio created during the beta and r process.

Answer:

Step 1/2
First, when a neutron star collapses, it undergoes a process called gravitational collapse, which causes the star to rapidly lose mass and density. This process releases a tremendous amount of energy, which can cause the star to explode in a supernova. During the supernova, the star's core undergoes a process called the r-process, which is responsible for creating heavy elements like gold. The r-process occurs when neutrons are added to atomic nuclei, causing them to become unstable and undergo beta decay. This process continues until the nucleus reaches a stable state, which is usually a heavy element like gold.

Step 2/2
The ratio of gold created during the r-process is not well understood, as it depends on a variety of factors, including the mass and density of the star, the amount of energy released during the supernova, and the specific conditions of the r-process. However, it is believed that the r-process is responsible for creating most of the heavy elements in the universe, including gold.<eos>
end 09:09:22
(base)


@obriensystems
Copy link
Member Author

obriensystems commented Apr 21, 2024

Single NVIDIA A6000 - Ampere GA102 (see L40s equivalent on GCP)
12 seconds for 170 tokens = 14 tokens/sec
98% GPU utilization of 10k cores and 34GB/48GB VRAM @ 85% TDP 250W of 300W
0% PCIe bus interface load of 768 GB/s (384 bit)

CPU - 14900K - 6400MHz RAM - overclocked
89 seconds (7.4x A6000) = 1.9 tokens/sec
90% CPU utilization of 32 vCores (24+8) and 33GB/64GB RAM

Dual NVIDIA 4090 - Ada AD102
102 seconds (8.5x A6000) = 1.7 tokens/sec
70% GPU utilization of 2x 16k cores and 34GB/48GB VRAM @ 22% TDP 220W of 900W
60% PCIe bus interface load of 1008 GB/s (384 bit)

Dual NVIDIA A4500 - Ampere GA102
119 seconds (10x A6000) = 1.4 tokens/sec
75% GPU utilization of 2x 7k cores and 34GB/40GB VRAM @ 40% TDP 160W of 400W
75% PCIe bus interface load of 640 GB/s (320 bit)

CPU - 13900KS
147 seconds (12.3x A6000) = 1.2 tokens/sec
96% CPU utilization of 32 vCores (24+8) and 34GB/64GB RAM 

CPU - 13900K
152 seconds (13x A6000) = 1.1 tokens/sec
98% CPU utilization of 32 vCores (24+8) and 35GB/192GB RAM 

Dual L4 on GCP - Ampere AD104
202 seconds (17x A6000) = 0.85 tokens/sec
65% GPU utilization of 2x 7k cores and 35GB/46GB VRAM @ 50% TDP 70W of 150W
?% PCIe bus interface load of 300 GB/s (192 bit)

obriensystems added a commit that referenced this issue May 19, 2024
obriensystems added a commit that referenced this issue Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants