Better convert. #384

Narsil · 2023-11-17T17:28:03Z

What does this PR do?

Fixes # (issue) or description of the problem this PR solves.

Wauplin

Hey @Narsil, just reviewed the PR. I mostly focused on the huggingface_hub integration and less on the shared_tensors/discard_names logic. The current version won't work on a different revision but will be very quick to fix (see comments).

Wauplin · 2023-11-20T09:21:30Z

bindings/python/convert.py

-        main_commit = api.list_repo_commits(model_id)[0].commit_id
-        discussions = api.get_repo_discussions(repo_id=model_id)
+        main_commit = api.list_repo_commits(model_id, revision=revision)[0].commit_id
+        discussions = api.get_repo_discussions(repo_id=model_id, revision=revision)


get_repo_discussions lists all discussions/PRs in the Community Tab. It doesn't have a revision parameter => previous_pr will always fail (return None)

Wauplin · 2023-11-20T09:23:55Z

bindings/python/convert.py

@@ -101,12 +157,12 @@ def convert_multi(model_id: str, folder: str, token: Optional[str]) -> Conversio
    return operations, errors


-def convert_single(model_id: str, folder: str, token: Optional[str]) -> ConversionResult:
+def convert_single(model_id: str, *, revision: Optional[str], folder: str, token: Optional[str], discard_names: List[str]) -> ConversionResult:
    pt_filename = hf_hub_download(repo_id=model_id, filename="pytorch_model.bin", token=token, cache_dir=folder)


Use revision parameter to download pt_filename?

Forgot ! Thanks

Wauplin · 2023-11-20T09:34:45Z

bindings/python/convert.py

    filenames = set(s.rfilename for s in info.siblings)

-    with TemporaryDirectory() as d:
+    with TemporaryDirectory(prefix=os.getenv("HF_HOME", "") + "/") as d:


(nit) you can use huggingface_hub.constants.HF_HOME to retrieve the user hf home (will check for HF_HOME or XDG_CACHE_HOME env variable + default to ~/.cache/huggingface)

(nit 2) I would set a default directory in HF_HOME + "/safetensors_converter" just in case the converter crash without cleaning the folder afterwards (at least all tmp directories will be in the same place)

Indeed. I removed that part actually to keep everything in a real temporary folder (/tmp). It prevents cache reuse, but should make the OS clean up correctly.

I had set this up for testing to use a real SSD on the machine I was using (/tmp was mounted to a slower disk)

Wauplin · 2023-11-20T09:42:37Z

bindings/python/convert.py

    try:
-        main_commit = api.list_repo_commits(model_id)[0].commit_id
-        discussions = api.get_repo_discussions(repo_id=model_id)
+        main_commit = api.list_repo_commits(model_id, revision=revision)[0].commit_id


no need to list all commit history to get the last one. This information is available as model_info(...).sha

(nit) I would also rename the variable to something like revision_commit instead of main_commit (since pulling from revision and not main)

Suggested change

main_commit = api.list_repo_commits(model_id, revision=revision)[0].commit_id

revision_commit = api.model_info(model_id, revision=revision).sha

Agreed ! (Made send before only adding the revision)

Better convert.

98069ae

Narsil merged commit 1799438 into main Nov 17, 2023
9 of 10 checks passed

Narsil deleted the better_convert branch November 17, 2023 17:28

Wauplin reviewed Nov 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better convert. #384

Better convert. #384

Narsil commented Nov 17, 2023

Wauplin left a comment •

edited

Loading

Wauplin Nov 20, 2023

Narsil Nov 20, 2023

Wauplin Nov 20, 2023

Narsil Nov 20, 2023

Wauplin Nov 20, 2023

Narsil Nov 20, 2023

Wauplin Nov 20, 2023

Narsil Nov 20, 2023

	main_commit = api.list_repo_commits(model_id, revision=revision)[0].commit_id
	revision_commit = api.model_info(model_id, revision=revision).sha

Better convert. #384

Better convert. #384

Conversation

Narsil commented Nov 17, 2023

What does this PR do?

Wauplin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wauplin left a comment •

edited

Loading