Skip to content

Latest commit

 

History

History
205 lines (143 loc) · 7.16 KB

README.md

File metadata and controls

205 lines (143 loc) · 7.16 KB

hfutils

PyPI PyPI - Python Version Loc Comments

Code Test Package Release codecov

Discord GitHub Org's stars GitHub stars GitHub forks GitHub commit activity GitHub issues GitHub pulls Contributors GitHub license

Useful utilities for huggingface

Quick Start

To get started with hfutils, you can install it using PyPI:

pip install hfutils

Alternatively, you can install it from the source code:

git clone https://github.com/deepghs/hfutils.git
cd hfutils
pip install .

Verify the installation by checking the version:

hfutils -v

If Python is not available in your local environment, we recommend downloading the pre-compiled executable version from the releases.

Upload Data

Upload data to repositories using the following commands:

# Upload a single file to the repository
hfutils upload -r your/repository -i /your/local/file -f file/in/your/repo

# Upload files in a directory as an archive file to the repository
# More formats of archive files are supported
# See: https://deepghs.github.io/hfutils/main/api_doc/archive/index.html
hfutils upload -r your/repository -i /your/local/directory -a archive/file/in/your/repo.zip

# Upload files in a directory as a directory tree to the repository
hfutils upload -r your/repository -i /your/local/directory -d dir/in/your/repo

You can achieve the same using the Python API:

from hfutils.operate import upload_file_to_file, upload_directory_as_archive, upload_directory_as_directory

# Upload a single file to the repository
upload_file_to_file(
    local_file='/your/local/file',
    repo_id='your/repository',
    file_in_repo='file/in/your/repo'
)

# Upload files in a directory as an archive file to the repository
# More formats of archive files are supported
# See: https://deepghs.github.io/hfutils/main/api_doc/archive/index.html
upload_directory_as_archive(
    local_directory='/your/local/directory',
    repo_id='your/repository',
    archive_in_repo='archive/file/in/your/repo.zip'
)

# Upload files in a directory as a directory tree to the repository
upload_directory_as_directory(
    local_directory='/your/local/directory',
    repo_id='your/repository',
    path_in_repo='dir/in/your/repo'
)

Explore additional options for uploading:

hfutils upload -h

Download Data

Download data from repositories using the following commands:

# Download a single file from the repository
hfutils download -r your/repository -o /your/local/file -f file/in/your/repo

# Download an archive file from the repository and extract it to the given directory
# More formats of archive files are supported
# See: https://deepghs.github.io/hfutils/main/api_doc/archive/index.html
hfutils download -r your/repository -o /your/local/directory -a archive/file/in/your/repo.zip

# Download files from the repository as a directory tree
hfutils download -r your/repository -o /your/local/directory -d dir/in/your/repo

Use the Python API for the same functionality:

from hfutils.operate import download_file_to_file, download_archive_as_directory, download_directory_as_directory

# Download a single file from the repository
download_file_to_file(
    local_file='/your/local/file',
    repo_id='your/repository',
    file_in_repo='file/in/your/repo'
)

# Download an archive file from the repository and extract it to the given directory
# More formats of archive files are supported
# See: https://deepghs.github.io/hfutils/main/api_doc/archive/index.html
download_archive_as_directory(
    local_directory='/your/local/directory',
    repo_id='your/repository',
    file_in_repo='archive/file/in/your/repo.zip',
)

# Download files from the repository as a directory tree
download_directory_as_directory(
    local_directory='/your/local/directory',
    repo_id='your/repository',
    dir_in_repo='dir/in/your/repo'
)

Explore additional options for downloading:

hfutils download -h

List Files in Repository

List files in repositories

hfutils ls -r your/repository -o /your/local/file -d subdir/in/repo

Supported Formats

By default, we support the zip and tar formats, including .zip, .tar, .tar.gz, .tar.bz2, and .tar.xz.

If you require support for rar and 7z files, install the extra dependencies using the following command:

pip install hfutils[rar,7z]

NOTE: Creating RAR archive files is not supported. We use the rarfile library, which lacks the functionality for creating RAR files.

How to Access Private Repositories

Simply configure the HF_TOKEN environment variable by using your HuggingFace access token. Note that write permissions are required if you plan to upload any content.

How to Use Hf-Transfer for Acceleration

If you are using the PyPI CLI, you need to install hfutils with the following command:

pip install hfutils[transfer]

If you are using a precompiled executable file, the transfer module is integrated inside; simply use it.

Enable Hf-Transfer acceleration by setting the environment variable HF_HUB_ENABLE_HF_TRANSFER to 1.

How to Change the Temporary Directory

The temporary directory is utilized for storing partially downloaded files, consuming a considerable amount of disk space.

If your disk, especially the C drive on Windows, does not have sufficient space, simply set the TMPDIR to designate another directory as the temporary directory.