Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
jstzwj committed Jul 19, 2024
1 parent e962f2c commit 6ec1d8a
Show file tree
Hide file tree
Showing 4 changed files with 162 additions and 23 deletions.
72 changes: 72 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,78 @@ python -m olah.server --host localhost --port 8090 --repos-path ./hf_mirrors

**Note that the cached data between different versions cannot be migrated. Please delete the cache folder before upgrading to the latest version of Olah.**

## More Configurations

Additional configurations can be controlled through a configuration file by passing the `configs.toml` file as a command parameter:
```bash
python -m olah.server -c configs.toml
```

The complete content of the configuration file can be found at [assets/full_configs.toml](https://github.com/vtuber-plan/olah/blob/main/assets/full_configs.toml).

### Configuration Details
The first section, `basic`, is used to set up basic configurations for the mirror site:
```toml
[basic]
host = "localhost"
port = 8090
ssl-key = ""
ssl-cert = ""
repos-path = "./repos"
hf-scheme = "https"
hf-netloc = "huggingface.co"
hf-lfs-netloc = "cdn-lfs.huggingface.co"
mirror-scheme = "http"
mirror-netloc = "localhost:8090"
mirror-lfs-netloc = "localhost:8090"
mirrors-path = ["./mirrors_dir"]
```
- `host`: Sets the host address that Olah listens to.
- `port`: Sets the port that Olah listens to.
- `ssl-key` and `ssl-cert`: When enabling HTTPS, specify the file paths for the key and certificate.
- `repos-path`: Specifies the directory for storing cached data.
- `hf-scheme`: Network protocol for the Hugging Face official site (usually no need to modify).
- `hf-netloc`: Network location of the Hugging Face official site (usually no need to modify).
- `hf-lfs-netloc`: Network location for Hugging Face official site's LFS files (usually no need to modify).
- `mirror-scheme`: Network protocol for the Olah mirror site (should match the above settings; change to HTTPS if providing `ssl-key` and `ssl-cert`).
- `mirror-netloc`: Network location of the Olah mirror site (should match `host` and `port` settings).
- `mirror-lfs-netloc`: Network location for Olah mirror site's LFS (should match `host` and `port` settings).
- `mirrors-path`: Additional mirror file directories. If you have already cloned some Git repositories, you can place them in this directory for downloading. In this example, the directory is `./mirrors_dir`. To add a dataset like `Salesforce/wikitext`, you can place the Git repository in the directory `./mirrors_dir/datasets/Salesforce/wikitext`. Similarly, models can be placed under `./mirrors_dir/models/organization/repository`.

The second section allows for accessibility restrictions:
```toml
[accessibility]
offline = false

[[accessibility.proxy]]
repo = "cais/mmlu"
allow = true

[[accessibility.proxy]]
repo = "adept/fuyu-8b"
allow = false

[[accessibility.proxy]]
repo = "mistralai/*"
allow = true

[[accessibility.proxy]]
repo = "mistralai/Mistral.*"
allow = false
use_re = true

[[accessibility.cache]]
repo = "cais/mmlu"
allow = true

[[accessibility.cache]]
repo = "adept/fuyu-8b"
allow = false
```
- `offline`: Sets whether the Olah mirror site enters offline mode, no longer making requests to the Hugging Face official site for data updates. However, cached repositories can still be downloaded.
- `proxy`: Determines if the repository can be accessed through a proxy. By default, all repositories are allowed. The `repo` field is used to match the repository name. Regular expressions and wildcards can be used by setting `use_re` to control whether to use regular expressions (default is to use wildcards). The `allow` field controls whether the repository is allowed to be proxied.
- `cache`: Determines if the repository will be cached. By default, all repositories are allowed. The `repo` field is used to match the repository name. Regular expressions and wildcards can be used by setting `use_re` to control whether to use regular expressions (default is to use wildcards). The `allow` field controls whether the repository is allowed to be cached.

## Future Work

* Authentication
Expand Down
64 changes: 64 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,70 @@ python -m olah.server -c configs.toml

完整的配置文件内容见[assets/full_configs.toml](https://github.com/vtuber-plan/olah/blob/main/assets/full_configs.toml)

### 配置详解
第一部分basic字段用于对镜像站进行基本设置
```toml
[basic]
host = "localhost"
port = 8090
ssl-key = ""
ssl-cert = ""
repos-path = "./repos"
hf-scheme = "https"
hf-netloc = "huggingface.co"
hf-lfs-netloc = "cdn-lfs.huggingface.co"
mirror-scheme = "http"
mirror-netloc = "localhost:8090"
mirror-lfs-netloc = "localhost:8090"
mirrors-path = ["./mirrors_dir"]
```
host: 设置olah监听的host地址
port: 设置olah监听的端口
ssl-key和ssl-cert: 当需要开启HTTPS时传入key和cert的文件路径
repos-path: 用于保存缓存数据的目录
hf-scheme: huggingface官方站点的网络协议(一般不需要改动)
hf-netloc: huggingface官方站点的网络位置(一般不需要改动)
hf-lfs-netloc: huggingface官方站点LFS文件的网络位置(一般不需要改动)
mirror-scheme: Olah镜像站的网络协议(应当和上面的设置一致,当提供ssl-key和ssl-cert时,应改为https)
mirror-netloc: Olah镜像站的网络位置(应与host和port设置一致)
mirror-lfs-netloc: Olah镜像站LFS的网络位置(应与host和port设置一致)
mirrors-path: 额外的镜像文件目录。当你已经clone了一些git仓库时可以放入该目录下以供下载。此处例子目录为`./mirrors_dir`, 若要添加数据集`Salesforce/wikitext`,可将git仓库放置于`./mirrors_dir/datasets/Salesforce/wikitext`目录。同理,模型放置于`./mirrors_dir/models/organization/repository`下。

第二部分可以对可访问性进行限制
```toml

[accessibility]
offline = false

[[accessibility.proxy]]
repo = "cais/mmlu"
allow = true

[[accessibility.proxy]]
repo = "adept/fuyu-8b"
allow = false

[[accessibility.proxy]]
repo = "mistralai/*"
allow = true

[[accessibility.proxy]]
repo = "mistralai/Mistral.*"
allow = false
use_re = true

[[accessibility.cache]]
repo = "cais/mmlu"
allow = true

[[accessibility.cache]]
repo = "adept/fuyu-8b"
allow = false
```
offline: 设置Olah镜像站是否进入离线模式,不再向huggingface官方站点发出请求以进行数据更新,但已经缓存的仓库仍可以下载
proxy: 用于设置该仓库是否可以被代理,默认全部允许,`repo`用于匹配仓库名字; 可使用正则表达式和通配符两种模式,`use_re`用于控制是否使用正则表达式,默认使用通配符; `allow`控制该规则的属性是允许代理还是不允许代理。
cache: 用于设置该仓库是否会被缓存,默认全部允许,`repo`用于匹配仓库名字; 可使用正则表达式和通配符两种模式,`use_re`用于控制是否使用正则表达式,默认使用通配符; `allow`控制该规则的属性是允许代理还是不允许缓存。

## 许可证

olah采用MIT许可证发布。
Expand Down
13 changes: 12 additions & 1 deletion olah/errors.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@



from fastapi import Response
from fastapi.responses import JSONResponse


Expand All @@ -12,4 +13,14 @@ def error_repo_not_found() -> JSONResponse:
"x-error-message": "Repository not found",
},
status_code=401,
)
)


def error_page_not_found() -> Response:
return Response(
headers={
"x-error-code": "RepoNotFound",
"x-error-message": "Sorry, we can't find the page you are looking for.",
},
status_code=404,
)
36 changes: 14 additions & 22 deletions olah/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import httpx
from pydantic import BaseSettings
from olah.configs import OlahConfig
from olah.errors import error_repo_not_found
from olah.errors import error_repo_not_found, error_page_not_found
from olah.mirror.repos import LocalMirrorRepo
from olah.proxy.files import cdn_file_get_generator, file_get_generator
from olah.proxy.lfs import lfs_get_generator, lfs_head_generator
Expand Down Expand Up @@ -85,9 +85,7 @@ class AppSettings(BaseSettings):
# ======================
async def meta_proxy_common(repo_type: str, org: str, repo: str, commit: str, request: Request) -> Response:
if repo_type not in REPO_TYPES_MAPPING.keys():
return Response(
content="Invalid repository type. ", status_code=403
)
return error_page_not_found()
if not await check_proxy_rules_hf(app, repo_type, org, repo):
return error_repo_not_found()
# Check Mirror Path
Expand Down Expand Up @@ -129,7 +127,7 @@ async def meta_proxy_common(repo_type: str, org: str, repo: str, commit: str, re
async def meta_proxy(repo_type: str, org_repo: str, request: Request):
org, repo = parse_org_repo(org_repo)
if org is None and repo is None:
return Response(content="This repository is not accessible.", status_code=404)
return error_repo_not_found()
if not app.app_settings.config.offline:
new_commit = await get_newest_commit_hf(app, repo_type, org, repo)
else:
Expand All @@ -152,7 +150,7 @@ async def meta_proxy_commit2(
async def meta_proxy_commit(repo_type: str, org_repo: str, commit: str, request: Request):
org, repo = parse_org_repo(org_repo)
if org is None and repo is None:
return Response(content="This repository is not accessible.", status_code=404)
return error_repo_not_found()

return await meta_proxy_common(
repo_type=repo_type, org=org, repo=repo, commit=commit, request=request
Expand All @@ -166,9 +164,7 @@ async def file_head_common(
repo_type: str, org: str, repo: str, commit: str, file_path: str, request: Request
) -> Response:
if repo_type not in REPO_TYPES_MAPPING.keys():
return Response(
content="Invalid repository type. ", status_code=403
)
return error_page_not_found()
if not await check_proxy_rules_hf(app, repo_type, org, repo):
return error_repo_not_found()

Expand Down Expand Up @@ -235,9 +231,7 @@ async def file_head2(
repo_type: str = org_or_repo_type
org, repo = parse_org_repo(repo_name)
if org is None and repo is None:
return Response(
content="This repository is not accessible.", status_code=404
)
return error_repo_not_found()
else:
repo_type: str = "models"
org, repo = org_or_repo_type, repo_name
Expand All @@ -257,7 +251,7 @@ async def file_head(org_repo: str, commit: str, file_path: str, request: Request
repo_type: str = "models"
org, repo = parse_org_repo(org_repo)
if org is None and repo is None:
return Response(content="This repository is not accessible.", status_code=404)
return error_repo_not_found()
return await file_head_common(
repo_type=repo_type,
org=org,
Expand All @@ -273,10 +267,10 @@ async def file_head(org_repo: str, commit: str, file_path: str, request: Request
async def cdn_file_head(org_repo: str, hash_file: str, request: Request, repo_type: str = "models"):
org, repo = parse_org_repo(org_repo)
if org is None and repo is None:
return Response(content="This repository is not accessible.", status_code=404)
return error_repo_not_found()

if not await check_proxy_rules_hf(app, repo_type, org, repo):
return Response(content="This repository is forbidden by the mirror. ", status_code=403)
return error_repo_not_found()

try:
generator = await cdn_file_get_generator(app, repo_type, org, repo, hash_file, method="HEAD", request=request)
Expand All @@ -294,9 +288,7 @@ async def file_get_common(
repo_type: str, org: str, repo: str, commit: str, file_path: str, request: Request
) -> Response:
if repo_type not in REPO_TYPES_MAPPING.keys():
return Response(
content="Invalid repository type. ", status_code=403
)
return error_page_not_found()
if not await check_proxy_rules_hf(app, repo_type, org, repo):
return error_repo_not_found()
# Check Mirror Path
Expand Down Expand Up @@ -353,7 +345,7 @@ async def file_get2(org_or_repo_type: str, repo_name: str, commit: str, file_pat
repo_type: str = org_or_repo_type
org, repo = parse_org_repo(repo_name)
if org is None and repo is None:
return Response(content="This repository is not accessible.", status_code=404)
return error_repo_not_found()
else:
repo_type: str = "models"
org, repo = org_or_repo_type, repo_name
Expand All @@ -372,7 +364,7 @@ async def file_get(org_repo: str, commit: str, file_path: str, request: Request)
repo_type: str = "models"
org, repo = parse_org_repo(org_repo)
if org is None and repo is None:
return Response(content="This repository is not accessible.", status_code=404)
return error_repo_not_found()

return await file_get_common(
repo_type=repo_type,
Expand All @@ -388,10 +380,10 @@ async def file_get(org_repo: str, commit: str, file_path: str, request: Request)
async def cdn_file_get(org_repo: str, hash_file: str, request: Request, repo_type: str = "models"):
org, repo = parse_org_repo(org_repo)
if org is None and repo is None:
return Response(content="This repository is not accessible.", status_code=404)
return error_repo_not_found()

if not await check_proxy_rules_hf(app, repo_type, org, repo):
return Response(content="This repository is forbidden by the mirror. ", status_code=403)
return error_repo_not_found()
try:
generator = await cdn_file_get_generator(app, repo_type, org, repo, hash_file, method="GET", request=request)
status_code = await generator.__anext__()
Expand Down

0 comments on commit 6ec1d8a

Please sign in to comment.