Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for weasel #200

Merged
merged 6 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/update_category_docs.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from pathlib import Path
from spacy.cli._util import PROJECT_FILE, load_project_config
from weasel.cli.main import PROJECT_FILE
from weasel.util import load_project_config
from wasabi import msg, MarkdownRenderer
import typer

Expand Down
3 changes: 2 additions & 1 deletion .github/update_projects_jsonl.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from pathlib import Path
from spacy.cli._util import PROJECT_FILE, load_project_config
from weasel.cli.main import PROJECT_FILE
from weasel.util import load_project_config
from wasabi import msg
import json
import typer
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ on:
env:
# Make sure we're exiting training as early as possible
SPACY_CONFIG_OVERRIDES: '--training.max_epochs=1 --training.max_steps=1'
WEASEL_CONFIG_OVERRIDES: '--training.max_epochs=1 --training.max_steps=1'
WASABI_LOG_FRIENDLY: 1

jobs:
Expand All @@ -23,9 +24,9 @@ jobs:
matrix:
include:
- os: windows-2019
python_version: "3.7"
python_version: "3.8"
- os: ubuntu-20.04
python_version: "3.7"
python_version: "3.8"
runs-on: ${{ matrix.os }}

steps:
Expand Down
46 changes: 26 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,19 @@

# 🪐 Project Templates

[spaCy projects](https://spacy.io/usage/projects) let you manage and share
**end-to-end spaCy workflows** for different **use cases and domains**, and
[Weasel](https://github.com/explosion/weasel), previously
[spaCy projects](https://spacy.io/usage/projects), lets you manage and share
**end-to-end workflows** for different **use cases and domains**, and
orchestrate training, packaging and serving your custom pipelines. You can start
off by cloning a pre-defined project template, adjust it to fit your needs, load
in your data, train a pipeline, export it as a Python package, upload your
outputs to a remote storage and share your results with your team.

> ⚠️ spaCy project templates require [**spaCy v3**](https://spacy.io). You can
> install it from pip with `pip install spacy` or conda with
> `conda install spacy -c conda-forge`. Make sure to use a fresh virtual
> environment.
> ⚠️ Weasel project templates require
> [**Weasel**](https://github.com/explosion/weasel), which is also included by
> default with spaCy v3.7+. You can install it from pip with
> `pip install weasel` or conda with `conda install weasel -c conda-forge`. Make
> sure to use a fresh virtual environment.
>
> See the [`master` branch](https://github.com/explosion/projects/tree/master)
> for the previous version of this repo.
Expand All @@ -32,31 +34,35 @@ outputs to a remote storage and share your results with your team.

## 🚀 Quickstart

Projects can be used via the new
[`spacy project`](https://spacy.io/api/cli#project) CLI. To find out more about
a command, add `--help`. For detailed instructions, see the
[usage guide](https://spacy.io/usage/projects).

<!-- TODO: update example -->
Projects can be used via the
[`weasel`](https://github.com/explosion/weasel/blob/main/docs/cli.md) CLI, or
through the [`spacy project`](https://spacy.io/api/cli#project) alias. To find
out more about a command, add `--help`. For detailed instructions, see the
[Weasel documentation](https://github.com/explosion/weasel/tree/main#-documentation)
or [spaCy projects usage guide](https://spacy.io/usage/projects).

1. **Clone** the project template you want to use.
```bash
python -m spacy project clone tutorials/ner_fashion_brands
python -m weasel clone tutorials/ner_fashion_brands
```
2. **Fetch assets** (data, weights) defined in the `project.yml`.
2. **Install** any project requirements.
```bash
cd ner_fashion_brands
python -m spacy project assets
python -m pip install -r requirements.txt
```
3. **Fetch assets** (data, weights) defined in the `project.yml`.
```bash
python -m weasel assets
```
3. **Run a command** defined in the `project.yml`.
4. **Run a command** defined in the `project.yml`.
```bash
python -m spacy project run preprocess
python -m weasel run preprocess
```
4. **Run a workflow** of multiple steps in order.
5. **Run a workflow** of multiple steps in order.
```bash
python -m spacy project run all
python -m weasel run all
```
5. **Adjust** the template for **your specific use case**, load in your own
6. **Adjust** the template for **your specific use case**, load in your own
data, adjust the settings and model and share the result with your team.

## 👷‍♀️Repository maintanance
Expand Down
14 changes: 7 additions & 7 deletions benchmarks/healthsea_spancat/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Healthsea-Spancat
# 🪐 Weasel Project: Healthsea-Spancat

This spaCy project uses the Healthsea dataset to compare the performance between the Spancat and NER architecture.

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

svlandeg marked this conversation as resolved.
Show resolved Hide resolved
### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -29,7 +29,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -42,11 +42,11 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
| --- | --- | --- |
| `assets/annotation.jsonl` | URL | NER annotations exported from Prodigy with 5000 examples and 2 labels |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
12 changes: 6 additions & 6 deletions benchmarks/nel/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: NEL Benchmark
# 🪐 Weasel Project: NEL Benchmark

Pipeline for benchmarking NEL approaches (incl. candidate generation and entity disambiguation).

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -36,7 +36,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -45,7 +45,7 @@ inputs have changed.
| `all` | `download_mewsli9` &rarr; `download_model` &rarr; `wikid_clone` &rarr; `preprocess` &rarr; `wikid_download_assets` &rarr; `wikid_parse` &rarr; `wikid_create_kb` &rarr; `parse_corpus` &rarr; `compile_corpora` &rarr; `train` &rarr; `evaluate` &rarr; `compare_evaluations` |
| `training` | `train` &rarr; `evaluate` |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->

Notes:
> **Warning**: Parts of this project are currently not platform-agnostic and run only on Linux. Making the entire
Expand Down
1 change: 0 additions & 1 deletion benchmarks/nel/project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
title: 'NEL Benchmark'
description: "Pipeline for benchmarking NEL approaches (incl. candidate generation and entity disambiguation)."
spacy_version: ">=3.0.0,<3.6.0"
vars:
run: "cg-default"
language: "en"
Expand Down
1 change: 1 addition & 0 deletions benchmarks/nel/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ rapidfuzz>=2.0.0
spacyfishing
virtualenv
pysqlite3-binary
spacy>=3.0.0,<3.6.0
14 changes: 7 additions & 7 deletions benchmarks/ner_conll03/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Named Entity Recognition (CoNLL-2003)
# 🪐 Weasel Project: Named Entity Recognition (CoNLL-2003)

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -25,7 +25,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -36,7 +36,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -47,4 +47,4 @@ in the project directory.
| `assets/conll2003/train.iob` | Local | Training data (not available publicly so you have to add the file yourself) |
| `assets/orth_variants.json` | URL | A file containing orth variants for data augmentation |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
14 changes: 7 additions & 7 deletions benchmarks/ner_embeddings/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Comparing embedding layers in spaCy
# 🪐 Weasel Project: Comparing embedding layers in spaCy

This project contains the code to reproduce the results of the
[Multi hash embeddings in spaCy](https://arxiv.org/abs/2212.09255) technical report by Explosion.
Expand Down Expand Up @@ -29,12 +29,12 @@ the hash embedding layers. We apologize for the inconvenience.

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -54,7 +54,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -66,7 +66,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -76,4 +76,4 @@ in the project directory.
| `assets/fasttext.nl.gz` | URL | Dutch fastText vectors. |
| `span-labeling-datasets` | Git | |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
14 changes: 7 additions & 7 deletions benchmarks/parsing_penn_treebank/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Dependency Parsing (Penn Treebank)
# 🪐 Weasel Project: Dependency Parsing (Penn Treebank)

## 📋 project.yml

The [`project.yml`](project.yml) defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
[spaCy projects documentation](https://spacy.io/usage/projects).
[Weasel documentation](https://github.com/explosion/weasel).

### ⏯ Commands

The following commands are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run).
Commands are only re-run if their inputs have changed.

| Command | Description |
Expand All @@ -25,7 +25,7 @@ Commands are only re-run if their inputs have changed.
### ⏭ Workflows

The following workflows are defined by the project. They
can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
can be executed using [`weasel run [name]`](https://github.com/explosion/weasel/tree/main/docs/cli.md#rocket-run)
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.

Expand All @@ -36,7 +36,7 @@ inputs have changed.
### 🗂 Assets

The following assets are defined by the project. They can
be fetched by running [`spacy project assets`](https://spacy.io/api/cli#project-assets)
be fetched by running [`weasel assets`](https://github.com/explosion/weasel/tree/main/docs/cli.md#open_file_folder-assets)
in the project directory.

| File | Source | Description |
Expand All @@ -47,4 +47,4 @@ in the project directory.
| `assets/vectors.zip` | URL | GloVe vectors |
| `assets/orth_variants.json` | URL | A file containing orth variants for data augmentation |

<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->
<!-- WEASEL: AUTO-GENERATED DOCS END (do not remove) -->
Loading
Loading