Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Load model problem #800

Open
LiuJinzhe-Keepgoing opened this issue Nov 27, 2024 · 4 comments
Open

[Bug Report] Load model problem #800

LiuJinzhe-Keepgoing opened this issue Nov 27, 2024 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@LiuJinzhe-Keepgoing
Copy link

Hello, I have a strange phenomenon. This makes me very puzzled.
I use the following code to load the GPT2-xl model locally, but it can run and load normally in a Jupyter file. When I use another script to load the model, I keep reporting that I am downloading it from hugging face official website, but my machine can't connect to hugging face.
Two Jupyter files use the same conda environment, and the running results are as follows:
WeChat3b77ca23c7174d41b7ddb113e3d4c866
WeChat36deddf5f6fa37fe417206ce9815c8a5

加载失败的文件报错如下:


OSError                                   Traceback (most recent call last)
File ~/miniconda3/envs/knowledgecircuit/lib/python3.10/site-packages/urllib3/connection.py:199, in HTTPConnection._new_conn(self)
    198 try:
--> 199     sock = connection.create_connection(
    200         (self._dns_host, self.port),
    201         self.timeout,
    202         source_address=self.source_address,
    203         socket_options=self.socket_options,
    204     )
    205 except socket.gaierror as e:

File ~/miniconda3/envs/knowledgecircuit/lib/python3.10/site-packages/urllib3/util/connection.py:85, in create_connection(address, timeout, source_address, socket_options)
     84 try:
---> 85     raise err
     86 finally:
     87     # Break explicitly a reference cycle

File ~/miniconda3/envs/knowledgecircuit/lib/python3.10/site-packages/urllib3/util/connection.py:73, in create_connection(address, timeout, source_address, socket_options)
     72     sock.bind(source_address)
---> 73 sock.connect(sa)
     74 # Break explicitly a reference cycle

OSError: [Errno 101] Network is unreachable
...
    431 except EntryNotFoundError as e:
    432     if not _raise_exceptions_for_missing_entries:

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like gpt2-xl is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Load the model code as follows:

device = "cuda:0"
gpt2_medium_path = '/data/liujinzhe/model/openai-community/gpt2-xl'
hf_model = AutoModelForCausalLM.from_pretrained(gpt2_medium_path)
tokenizer = AutoTokenizer.from_pretrained(gpt2_medium_path)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
model = HookedTransformer.from_pretrained(
    model_name= "gpt2-xl",
    hf_model=hf_model, 
    tokenizer=tokenizer,
    local_path=gpt2_medium_path,
    center_unembed=False,
    center_writing_weights=False,
    fold_ln=True,
    device=device,
    # refactor_factored_attn_matrices=True,
) 
 
model.cfg.use_split_qkv_input = True
model.cfg.use_attn_result = True
model.cfg.use_hook_mlp_in = True
@bryce13950
Copy link
Collaborator

Do you have any way to have access to huggingface? This is a known issue where we are currently downloading config from huggingface, even when config is passed through #754. It will be patched at some point, but the easiest solution today is to make sure you have access to HuggingFace. If it is not possible for you to have access, let me know.

@bryce13950 bryce13950 added the question Further information is requested label Nov 27, 2024
@akul-sethi
Copy link

I am trying to load codellama and getting the same error. I have logged in using the hugging face cli and have access to the model as I can use it normally without transformer lens. Any idea what is happening? Code:

llama = HookedTransformer.from_pretrained(
    model_name="CodeLlama-7b-Python",
)

Error:

OSError: CodeLlama-7b-Python-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

@LiuJinzhe-Keepgoing
Copy link
Author

Sorry, I can't access the hugging face online loading model due to environmental constraints. I want to load it through the local model. Is there any good solution?@bryce13950

@HaThuyAn
Copy link

HaThuyAn commented Jan 8, 2025

I encountered the same issue and have been able to solve it by inspecting the library code.

The model I used is gpt2-small. When I download it from HuggingFace, the folder (folder X) that contains the config-related files (config.json, generation_config.json, ...) is nested inside other folders.
image

The way I solved this issue was to COPY folder X (moving and dropping it to another place caused the "invalid symlink" issue), then I went to the folder that contains the script to load the model into HookedTransformer (folder Y), and PASTE folder X into folder Y, renamed the folder X to "gpt2-small" (same with the official model name), and folder X is now in the same folder as the script. If this does not work, another method is to download the necessary config-related files directly from the model repository on HuggingFace, put all of them into a folder (with the folder name the same as the official model name), and place that folder into folder Y.

model = HookedTransformer.from_pretrained("gpt2-small") (This works without requiring access to huggingface.)

This works in my case and I hope it can help someone facing the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants