Missing Keys in state_dict #172

bjohn22 · 2024-05-06T01:02:36Z

I downloaded nvidia/Llama3-ChatQA-1.5-8B manually from HF into local. I ran scripts/convert_hf_checkpoint.py Then I wanted to run generate.py using the local checkpoint dir:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Transformer: Missing key(s) in state_dict: "tok_embeddings.weight", "layers.0.attention.wqkv.weight", "layers.0.attention.wo.weight", "layers.0.feed_forward.w1.weight", "layers.0.feed_forward.w3.weight", "layers.0.feed_forward.w2.weight", "layers.0.ffn_norm.weight", "layers.0.attention_norm.weight",

Here is my weight directory:

The text was updated successfully, but these errors were encountered:

yanboliang · 2024-09-16T04:31:20Z

Actually Llama3-ChatQA-1.5-8B is not supported, please check all supported models at:

gpt-fast/model.py

Lines 60 to 81 in c9f683e

    
           transformer_configs = { 
        
               "CodeLlama-7b-Python-hf": dict(block_size=16384, vocab_size=32000, n_layer=32, dim = 4096, rope_base=1000000), 
        
               "7B": dict(n_layer=32, n_head=32, dim=4096), 
        
               "13B": dict(n_layer=40, n_head=40, dim=5120), 
        
               "30B": dict(n_layer=60, n_head=52, dim=6656), 
        
               "34B": dict(n_layer=48, n_head=64, dim=8192, vocab_size=32000, n_local_heads=8, intermediate_size=22016, rope_base=1000000), # CodeLlama-34B-Python-hf 
        
               "70B": dict(n_layer=80, n_head=64, dim=8192, n_local_heads=8, intermediate_size=28672), 
        
               "Mistral-7B": dict(n_layer=32, n_head=32, n_local_heads=8, dim=4096, intermediate_size=14336, vocab_size=32000), 
        
               "stories15M": dict(n_layer=6, n_head=6, dim=288), 
        
               "stories110M": dict(n_layer=12, n_head=12, dim=768), 
        
               "llama-3-8b": dict(block_size=8192, n_layer=32, n_head=32, n_local_heads=8, dim=4096, intermediate_size=14336, vocab_size=128256, rope_base=500000), 
        
               "llama-3-70b": dict(block_size=8192, n_layer=80, n_head=64, n_local_heads=8, dim=8192, intermediate_size=28672, vocab_size=128256, rope_base=500000), 
        
               "llama-3.1-8b": dict(block_size=131072, n_layer=32, n_head=32, n_local_heads=8, dim=4096, intermediate_size=14336, vocab_size=128256, rope_base=500000, 
        
                   rope_scaling=dict(factor=8.0, low_freq_factor=1.0, high_freq_factor=4.0, original_max_position_embeddings=8192), 
        
               ), 
        
               "llama-3.1-70b": dict(block_size=131072, n_layer=80, n_head=64, n_local_heads=8, dim=8192, intermediate_size=28672, vocab_size=128256, rope_base=500000, 
        
                   rope_scaling=dict(factor=8.0, low_freq_factor=1.0, high_freq_factor=4.0, original_max_position_embeddings=8192), 
        
               ), 
        
               "llama-3.1-405b": dict(block_size=131072, n_layer=126, n_head=128, n_local_heads=8, dim=16384, intermediate_size=53248, vocab_size=128256, rope_base=500000, 
        
                   rope_scaling=dict(factor=8.0, low_freq_factor=1.0, high_freq_factor=4.0, original_max_position_embeddings=8192), 
        
               ),

But I think you can replace llama-3-8b in the list with Llama3-ChatQA-1.5-8B and play around it. They should have the same architecture.

bjohn22 · 2024-09-16T19:46:18Z

Thank you for this comment. Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Yanbo Liang ***@***.***> Sent: Sunday, September 15, 2024 11:31:41 PM To: pytorch-labs/gpt-fast ***@***.***> Cc: John B Olan ***@***.***>; Author ***@***.***> Subject: Re: [pytorch-labs/gpt-fast] Missing Keys in state_dict (Issue #172) Actually Llama3-ChatQA-1.5-8B is not supported, please check all supported models at: https://github.com/pytorch-labs/gpt-fast/blob/c9f683edd4f89d3e81ed8f52387e866a245e3226/model.py#L60-L81 But I think you can replace llama-3-8b in the list with Llama3-ChatQA-1.5-8B and play around it. They should have the same architecture. — Reply to this email directly, view it on GitHub<#172 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/APJRHRK4Q2GP3ON2G2WQZTLZWZNK3AVCNFSM6AAAAABHIFYWXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJSGAYDCOJVGY>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Keys in state_dict #172

Missing Keys in state_dict #172

bjohn22 commented May 6, 2024

yanboliang commented Sep 16, 2024

bjohn22 commented Sep 16, 2024 via email

Missing Keys in state_dict #172

Missing Keys in state_dict #172

Comments

bjohn22 commented May 6, 2024

yanboliang commented Sep 16, 2024

bjohn22 commented Sep 16, 2024 via email