diff --git a/CHANGELOG.md b/CHANGELOG.md index ce323d43b..8a3e20e9a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -177,14 +177,12 @@ Nextclade CLI will warn users when input datasets contains extra files which are We added one more build variant to Bioconda distribution channel - for Linux operating system on 64-bit ARM hardware architecture. It uses `nextclade-aarch64-unknown-linux-gnu` executable underneath. This can be useful if you prefer to manage Nextclade CLI installation on your Linux ARM machine or in a Docker ARM container with Conda package manager. However, because Nextclade CLI is a self-contained single-file executable, we still recommend [direct downloads from GitHub Releases](https://github.com/nextstrain/nextclade/releases) rather than Conda or other installation methods. -## Nextclade CLI 3.3.1 +## Nextclade 3.3.1 ### Fix crash when using `--verbosity` option Nextclade was crashing with internal error when `--verbosity` option was present. This has been fixed. -## Nextclade Web 3.3.1 - ### Restrict Safari browser support to >= 16.5 Nextclade reports WebWorker-related errors when analysis is started on Safari browser. The minimum working version of Safari we were able to successfully test Nextclade on is 16.5. We still recommend using Chrome or Firefox for the best experience. @@ -249,7 +247,7 @@ Due to popular demand, we are bringing back `--input-pcr-primers` argument for N Results table stripes are always alternating now, regardless of sorting and filtering applied. This is only a visual change and does not affect any functionality. -## Nextclade CLI 3.0.1 +## Nextclade 3.0.1 #### Bug fixes @@ -261,11 +259,11 @@ Results table stripes are always alternating now, regardless of sorting and filt - Added a section to the v3 migration guide about the renamed default path for translations, a breaking change. The new default output path for translations is `nextclade.cds_translation.{cds}.fasta`. Before v3, the default path was `nextclade_gene_{gene}.translation.fasta`. You can emulate the old (default) behavior by passing `--output-translations="nextclade_gene_{cds}.translation.fasta"` to `nextclade3`. -## Nextclade Web 3.0.1 +### Fix links Fixed links on navigation bar: "Docs" and "CLI" -## 3.0.0 +## Nextclade 3.0.0 We are happy to present a major release of Nextclade, containing new features and bug fixes. @@ -544,4 +542,4 @@ The change in genome annotation handling had significant consequences for coordi ## Older versions -For changes in Nextclade v2 and below, see [docs/changes/CHANGELOG.old.md](docs/changes/CHANGELOG.old.md) +For changes in older versions, see [docs/changes/CHANGELOG.old.md](https://github.com/nextstrain/nextclade/blob/master/docs/changes/CHANGELOG.old.md) diff --git a/docs/conf.py b/docs/conf.py index a5e2293c5..7ce787d07 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -13,11 +13,8 @@ import os import sys from datetime import datetime -sys.path.insert(0, os.path.abspath('.')) -# At top on conf.py (with other import statements) -import recommonmark -from recommonmark.transform import AutoStructify +sys.path.insert(0, os.path.abspath('.')) # -- Project information ----------------------------------------------------- @@ -25,21 +22,20 @@ copyright = f'2020-{datetime.now().year}, Trevor Bedford and Richard Neher' author = 'The Nextstrain Team' - # -- General configuration --------------------------------------------------- # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ - 'recommonmark', - 'sphinx.ext.intersphinx', - 'sphinx.ext.mathjax', - 'sphinx_markdown_tables', - 'sphinxarg.ext', - 'sphinx.ext.autodoc', - 'sphinx_tabs.tabs', - 'nextstrain.sphinx.theme', + 'myst_parser', + 'sphinx.ext.intersphinx', + 'sphinx.ext.mathjax', + 'sphinx_markdown_tables', + 'sphinxarg.ext', + 'sphinx.ext.autodoc', + 'sphinx_tabs.tabs', + 'nextstrain.sphinx.theme', ] # Add any paths that contain templates here, relative to this directory. @@ -49,10 +45,28 @@ # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path. exclude_patterns = [ - "README.md" - "dev/**/*" + "README.md", + "assets/**", + "build/**", + "changes/CHANGELOG.old.md", + "dev/docs-meta.md", + "dev/old-versions.md", ] +myst_enable_extensions = [ + "amsmath", + "dollarmath", + "linkify", + "strikethrough", +] +myst_heading_anchors = 6 +myst_gfm_only = False # For math to work. GitHub renders this syntax just fine though. +myst_linkify_fuzzy_links = False +myst_url_schemes = ["mailto", "http", "https"] + +suppress_warnings = [ + "myst.header" +] # -- Options for HTML output ------------------------------------------------- @@ -67,30 +81,18 @@ html_static_path = ['_static'] html_css_files = [ - 'css/custom.css', + 'css/custom.css', ] html_favicon = '_static/favicon.ico' html_theme_options = { - 'logo_only': False, - 'collapse_navigation': False, - 'titles_only': True, + 'logo_only': False, + 'collapse_navigation': False, + 'titles_only': True, } - # -- Cross-project references ------------------------------------------------ intersphinx_mapping = { } - - -# At the bottom of conf.py -def setup(app): - app.add_config_value('recommonmark_config', { - 'url_resolver': lambda url: github_doc_root + url, - 'auto_toc_tree_section': 'Contents', - 'enable_math': True, - 'enable_inline_math': True, - }, True) - app.add_transform(AutoStructify) diff --git a/docs/dev/developer-guide.md b/docs/dev/developer-guide.md index 6c1d593fc..d4b1c11a1 100644 --- a/docs/dev/developer-guide.md +++ b/docs/dev/developer-guide.md @@ -7,7 +7,7 @@ This guide describes: - how to build and run Nextclade CLI and Nextclade Web - how the official distributions are maintained, released and deployed -This is only useful if you know programming at least a little or is curations about how Nextclade is built. +This is only useful if you know programming at least a little or is curious about how Nextclade is built. > ⚠️ If you are Nextclade user or is looking to familiarize yourself with Nextclade usage and features, then refer to [Nextclade user documentation](https://docs.nextstrain.org/projects/nextclade/en/stable/index.html) instead. @@ -21,9 +21,10 @@ This is only useful if you know programming at least a little or is curations ab Nextclade CLI is written in Rust programming language. The usual `rustup` & `cargo` workflow can be used. -If you are not familiar with Rust, please refer to documentation: +If you are not familiar with Rust, please refer to official documentation: -- [Rust](https://www.rust-lang.org/learn) - the programming language itself +- [Rust](https://www.rust-lang.org) - the programming language itself +- [Rust: learn](https://www.rust-lang.org/learn) - official learning materials: The Rust book, The Ruststligs course, examples. - [Rustup](https://rust-lang.github.io/rustup/) - Rust toolchain installer and version manager - [Cargo](https://doc.rust-lang.org/cargo/) - Rust package manager @@ -42,13 +43,17 @@ as well as to the `--help` text for each tool. cd nextclade ``` - > 💡 We accept pull requests on GitHub. If you want to submit a with new feature or a bug fix, then make a GitHub account, [make a fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo) of the [origin Nextclade repository](https://github.com/nextstrain/nextclade) and clone your forked repository instead. Refer to [GitHub documentation "Contributing to projects"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) for more details. + > 💡 We accept pull requests on GitHub. If you want to submit a new feature or a bug fix, then create a GitHub account, [make a fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo) of the [origin repository `nextstrain/nextclade`](https://github.com/nextstrain/nextclade) and clone your forked repository instead. Refer to [GitHub documentation "Contributing to projects"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) for more details. > 💡 Make sure you [keep your local code up to date](https://github.com/git-guides/git-pull) with the origin repo, [especially if it's forked](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork). > 💡 If you are a member of Nextstrain team, then you don't need a fork and you can contribute directly to the origin repository. Still, in most cases, please submit pull requests for review, rather than pushing changes to major branches directly. -2. Install Rust if not already (https://www.rust-lang.org/tools/install): +2. Install Rust if not already (once) ([https://www.rust-lang.org/tools/install](https://www.rust-lang.org/tools/install)): + + This step is the same as for Nextclade CLI (see above). You can skip this step if you've done the setup for Nextclade CLI already. + + The only supported Rust version is the one declared in [`rust-toolchain.toml`](https://github.com/nextstrain/nextclade/blob/master/rust-toolchain.toml). It may change in the future. ```bash # [once] Install Rustup, the Rust version manager @@ -67,11 +72,11 @@ as well as to the `--help` text for each tool. $ rustup -V ``` - > ⚠️ We don't support Rust installations deviating from the [officially recommended steps](https://doc.rust-lang.org/book/ch01-01-installation.html). If you install Rust from Linux OS package repositories, Homebrew, Conda etc., things may or may not work, or they may work but produce wrong results. Nextclade team don't have bandwidth to try every platform and config, so if you decide to go unofficial route, then you are on your own. But feel free to open pull requests with fixes, where necessary. + > ⚠️ Nextclade team doesn't have bandwidth to support Rust installations deviating from the [officially recommended steps](https://doc.rust-lang.org/book/ch01-01-installation.html) and Rust versions different from what is declared in [rust-toolchain.toml](https://github.com/nextstrain/nextclade/blob/master/rust-toolchain.toml). If you install Rust from Linux package repositories, Homebrew, Conda etc., things may or may not work, or they may work but produce wrong results. Nextclade team doesn't have bandwidth to try every platform and config, so if you decide to go unofficial route, then you are on your own. But feel free to open pull requests if fixes are necessary to make your setup work. - > 💡 Note, Rustup allows to install multiple versions of Rust and to switch between them. This repository contains a [rust-toolchain.toml](../../rust-toolchain.toml) file, which describes which version of Rust is currently in use by Nextclade official build. Cargo and Rustup should be able to [pick it up automatically](https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file), install the required toolchain and use it when you type `cargo` commands. Any other versions of Rust toolchain are not supported. + > 💡 Note, Rustup allows to install multiple versions of Rust and to switch between them. This repository contains a [rust-toolchain.toml](https://github.com/nextstrain/nextclade/blob/master/rust-toolchain.toml) file, which describes which version of Rust is currently in use by Nextclade official build. Cargo and Rustup should be able to [pick it up automatically](https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file), install the required toolchain and use it when you type `cargo` commands. Any other versions of Rust toolchain are not supported. -3. Prepare environment variables which configure Nextclade build-time settings (once). Optionally adjust the variables in the `.env` file to your needs. +3. Prepare environment variables (once). They configure Nextclade build-time settings. Optionally adjust the variables in the `.env` file to your needs. ```bash # [once] Prepare dotenv file with default values @@ -85,21 +90,29 @@ as well as to the `--help` text for each tool. # By default, the resulting executable will be in `target/debug/nextclade`. cargo build --bin=nextclade - # (Re-)build Nextclade in debug mode and run `nextclade --help` to print Nextclade CLI main help screen. The arguments after the `--` are passed to nextclade executable, as if you'd run it directly. You can also refer to Nextclade user documentation (https://docs.nextstrain.org/projects/nextclade/en/stable/index.html) for explanation of arguments. + # (Re-)build Nextclade in debug mode and run `nextclade --help` to print + # Nextclade CLI main help screen. The arguments after the `--` are passed + # to nextclade executable, as if you'd run it directly. + # Refer to Nextclade user documentation for explanation of arguments. cargo run --bin=nextclade -- --help - # (Re-)build Nextclade in debug mode and use it to download a dataset to `data_dev/` directory. + # (Re-)build Nextclade in debug mode and use it to download a dataset to + # `data_dev/` directory. cargo run --bin=nextclade -- dataset get \ --name='sars-cov-2' \ --output-dir='data_dev/sars-cov-2' - # (Re-)build Nextclade in debug mode and run the analysis using the dataset we just downloaded (to `data_dev/`) and output results to the `out/` directory. + # (Re-)build Nextclade in debug mode and run the analysis using the + # dataset we just downloaded (to `data_dev/`) and output results to + # the `out/` directory. cargo run --bin=nextclade -- run \ 'data_dev/sars-cov-2/sequences.fasta' \ --input-dataset='data_dev/sars-cov-2/' \ --output-all='out/' ``` + The `cargo run` command automatically performs the `cargo build` command if there are code changes. + > 💡 Note, depending on your computer hardware and internet speed, your first build can take significant amount of time, because the necessary Rust toolchain version and all dependency packages (crates) will be downloaded and compiled. Next time the existing toolchain and cached packages are used, so the repeated builds should be much faster. > 💡 Add `-v` to Nextclade arguments to make console output more verbose. Add more occurrences, e.g. `-vv`, for even more verbose output. @@ -114,27 +127,21 @@ as well as to the `--help` text for each tool. # Run Nextclade release binary ./target/release/nextclade run \ 'data_dev/sars-cov-2/sequences.fasta' \ - --input-dataset='data_dev/sars-cov-2' \ - --output-fasta='out/nextclade.aligned.fasta' \ - --output-tsv='out/nextclade.tsv' \ - --output-tree='out/nextclade.tree.json' \ - --in-order \ - --include-reference + --input-dataset='data_dev/sars-cov-2/' \ + --output-all='out/' ``` - > 💡 Debug builds are incremental, i.e. only the files that have changed since the last build are compiled. But release builds are not. If you need to quickly iterate on features, then use debug builds. If you are measuring performance, or make a build for daily usage, always use release builds. + > 💡 Debug builds are incremental, i.e. only the files that have changed since the last build are compiled, which is much faster that full build. But release builds are always full builds, with additional optimization passes, so they take much more time. If you need to quickly iterate on features, then use debug builds. If you are measuring performance, or building binaries for the actual daily usage, always use release builds. ### Nextclade Web -Nextclade Web is a React & Typescript application, which relies on Nextclade WebAssembly (wasm) module to perform the computation. This WebAssembly module shares the same Rust code for algorithms as Nextclade CLI. So building Nextclade Web involves 2 steps: - -- building WebAssembly module -- building the web application itself +Nextclade Web is a React & Typescript application, which relies on Nextclade WebAssembly (wasm) modules to perform the computation. These WebAssembly modules share Rust code with Nextclade CLI. So building Nextclade Web involves 2 steps: -Install Node.js version 14+ (latest LTS release is recommended), by either downloading it from the official website: https://nodejs.org/en/download/, or by using [nvm](https://github.com/nvm-sh/nvm). +- building WebAssembly modules (the algorithms "backend") +- building the web application itself (the frontend) -> ⚠️ We don't have bandwidth to support Node.js installations from Linux OS package repositories, Homebrew, Conda and everything else deviating from the officially recommended setup. If you decide to go that route - things may or may not work - you are on your own. But feel free to open pull requests with fixes if necessary. +Note that there is no actual programmable backend server. Nextclade Web is a static application which can be deployed to any static web hosting. Instead of the backend server, the frontend communicates with the WebAssembly module which is deployed into a pool of WebWorkers running directly in the user's browser. #### Steps @@ -144,10 +151,10 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl Clone Nextclade git repository: - ```bash - git clone https://github.com/nextstrain/nextclade - cd nextclade - ``` + ```bash + git clone https://github.com/nextstrain/nextclade + cd nextclade + ``` > 💡 We accept pull requests on GitHub. If you want to submit a with new feature or a bug fixe, then make a GitHub account, [make a fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo) of the [origin Nextclade repository](https://github.com/nextstrain/nextclade) and clone your forked repository instead. Refer to [GitHub documentation "Contributing to projects"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) for more details. @@ -155,7 +162,26 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl > 💡 If you are a member of Nextstrain team, then you don't need a fork and you can contribute directly to the origin repository. Still, in most cases, please submit pull requests for review, rather than pushing changes to branches directly. -2. Install Rust if not already (https://www.rust-lang.org/tools/install): +2. Install Node.js (once), by either downloading it from the official website: [nodejs.org](https://nodejs.org), or by using [nvm](https://github.com/nvm-sh/nvm). + + The only supported Node.js version is the one that is currently declared in the [`.nvmrc`](https://github.com/nextstrain/nextclade/blob/master/.nvmrc) file. It may change in the future. + + If you have `nvm` installed and configured, you can quickly install and switch to this Node.js version by navigating to the root of nextclade repository (where the [`.nvmrc`](https://github.com/nextstrain/nextclade/blob/master/.nvmrc) file is) and running: + + ```bash + cd nextclade/ + nvm install + nvm use + node --version + ``` + + > ⚠️ Nextclade team doesn't have bandwidth to support Node.js installations from Linux package repositories, Homebrew, Conda, as well as versions of Node.js which are not the same as currently declared in the [`.nvmrc`](https://github.com/nextstrain/nextclade/blob/master/.nvmrc), and everything else deviating from the recommended setup. If you decide to go that route - things may or may not work - you are on your own. But feel free to open pull requests if fixes are necessary to make your setup work. + +3. Install Rust if not already (once) ([https://www.rust-lang.org/tools/install](https://www.rust-lang.org/tools/install)): + + This step is the same as for Nextclade CLI (see above). You can skip this step if you've done the setup for Nextclade CLI already. + + The only supported Rust version is the one declared in [`rust-toolchain.toml`](https://github.com/nextstrain/nextclade/blob/master/rust-toolchain.toml). It may change in the future. ```bash # [once] Install Rustup, the Rust version manager @@ -174,17 +200,17 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl $ rustup -V ``` - > ⚠️ We don't support Rust installations deviating from the [officially recommended steps](https://doc.rust-lang.org/book/ch01-01-installation.html). If you install Rust from Linux OS package repositories, Homebrew, Conda etc., things may or may not work, or they may work but produce wrong results. Nextclade team don't have bandwidth to try every platform and config, so if you decide to go unofficial route, then you are on your own. But feel free to open pull requests with fixes, where necessary. + > ⚠️ Nextclade team doesn't have bandwidth to support Rust installations deviating from the [officially recommended steps](https://doc.rust-lang.org/book/ch01-01-installation.html) and Rust versions different from what is declared in [rust-toolchain.toml](https://github.com/nextstrain/nextclade/blob/master/rust-toolchain.toml). If you install Rust from Linux package repositories, Homebrew, Conda etc., things may or may not work, or they may work but produce wrong results. Nextclade team doesn't have bandwidth to try every platform and config, so if you decide to go unofficial route, then you are on your own. But feel free to open pull requests if fixes are necessary to make your setup work. - > 💡 Note, Rustup allows to install multiple versions of Rust and to switch between them. This repository contains a [rust-toolchain.toml](../../rust-toolchain.toml) file, which describes which version of Rust is currently in use by Nextclade official build. Cargo and Rustup should be able to [pick it up automatically](https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file), install the required toolchain and use it when you type `cargo` commands. Any other versions of Rust toolchain are not supported. + > 💡 Note, Rustup allows to install multiple versions of Rust and to switch between them. This repository contains a [rust-toolchain.toml](https://github.com/nextstrain/nextclade/blob/master/rust-toolchain.toml) file, which describes which version of Rust is currently in use by Nextclade official build. Cargo and Rustup should be able to [pick it up automatically](https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file), install the required toolchain and use it when you type `cargo` commands. Any other versions of Rust toolchain are not supported. -3. Prepare environment variables which configure Nextclade build-time settings (once). Optionally adjust the variables in the `.env` file to your needs. +4. Prepare environment variables (once). They configure Nextclade build-time settings. Optionally adjust the variables in the `.env` file to your needs. ```bash cp .env.example .env ``` -4. Install other required tools (once) +5. Install other required tools (once) ```bash cargo install wasm-pack @@ -208,7 +234,7 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl -5. Install NPM dependencies (once) +6. Install NPM dependencies (once) ```bash cd packages/nextclade-web @@ -217,7 +243,7 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl > ⚠️ Nextclade uses `yarn` to manage NPM dependencies. While you could try `npm` or other tools instead, we don't support this. -6. Build the WebAssembly module +7. Build the WebAssembly module ```bash cd packages/nextclade-web @@ -226,9 +252,9 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl This step might take a lot of time. The WebAssembly module and accompanying Typescript code should be been generated into `packages/nextclade-web/src/gen/`. The web application should be able to find it there. - Repeat this step every time you are touching Rust code. + Repeat this step every time you are changing Rust code. -7. Build and serve the web app +8. Build and serve the development version of the web app locally We are going to run a development web server, which runs continuously (it does not yield terminal prompt until you stop it). It's convenient to do it in a separate terminal instance from WebAssembly module build to allow rebuilding the app and the module independently. @@ -239,9 +265,13 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl yarn dev ``` - Open `http://localhost:3000/` in the browser. Typescript code changes should trigger rebuild and fast refresh of the app. If you rebuild the WebAssembly module (ina separate terminal instance), it should also pick up the changes automatically. + This runs Next.js dev server (continuously). Open `http://localhost:3000/` in the browser. Typescript code changes should trigger automatic rebuild and fast refresh of the app in the browser - no dev server restart is typically necessary. + + Note that changes in Rust code (the algorithms) are not picked up automatically and the requirement rebuilding the WebAssembly module manually (as explained above). Once you rebuild the WebAssembly module in a separate terminal instance, the dev server should pick up the changes in the algorithms - no dev server restart is necessary. - Alternatively, the optimized ("production") version of the web app can be built and served with +9. Build and serve the production version of the web app locally + + Alternatively, the optimized ("production") version of the web app can be built and (optionally) served with ```bash yarn prod:build @@ -250,11 +280,11 @@ Install Node.js version 14+ (latest LTS release is recommended), by either downl Open `http://localhost:8080/` in the browser. - The resulting HTML, CSS and JS files should be available under `packages/nextclade-web/.build/production/web`. + The resulting HTML, CSS, JS and WASM files should be available under `packages/nextclade-web/.build/production/web/`. This is the "web root" of the application. All files required to deploy and serve Nextclade Web are there. - Production build does not have automatic rebuild and reload. You need to do full rebuild on every code change. + The production build does not have automatic rebuild and reload. You need to do full rebuild on every code change - both the WebAssembly module and then the web app. - The `yarn prod:serve` command runs Express underneath and it is just an example of a simple (also slow and insecure) local file web server. But the produced HTML, CSS and JS files can be served using any static file web server or static file hosting service. The official deployment uses AWS S3 + Cloudfront. + The `yarn prod:serve` command runs Express underneath and it is just an example of a simple (also slow and insecure) local file web server. But the produced files can be served using any static file web server (Apache, Nginx, Caddy, Express, etc.), static file hosting services, or cloud services (AWS S3, Vercel, GitHub Pages, etc.). The official deployment uses AWS S3 + Cloudfront. ### Internationalization (translation) @@ -323,7 +353,7 @@ Automatic fixes can be applied using: cargo clippy --fix ``` -Clippy is configured in `clippy.toml` and in `.cargo/config.toml`. +Clippy is configured in `clippy.toml` and in root `Cargo.toml`. For routine development, it is recommended to configure your text editor to see the Rust compiler and linter errors. @@ -367,6 +397,8 @@ For routine development, it is recommended to configure your text editor to see +
+ #### Linting Typescript and JavaScript The web app is linted using [eslint](https://github.com/eslint/eslint) and [tsc](https://www.typescriptlang.org/docs/handbook/compiler-options.html) as a part of development command, but the same lints also be run separately: @@ -382,28 +414,36 @@ Modern text editors should be able to display ESLint warnings out of the box as ### Formatting (code style) -Rust: +#### Formatting Rust + +We use `rustfmt` to format Rust code. It is installed during initial setup, along with the rest of dependencies. The configuration is in `rustfmt.toml`. You can fix the formatting using: ```bash cargo fmt --all ``` -Typescript: +Make sure your text editor is configured to use `rustfmt` for code formatting. + +#### Formatting Typescript and JavaScript + +We use `prettier` to format TS and JS code. It is installed during initial setup, along with the rest of dependencies. Configuration is in packages/nextclade-web/.prettierrc and in `.editorconfig`. You can fix the formatting using: ```bash cd packages/nextclade-web yarn format:fix ``` +Make sure your text editor is [configured](https://prettier.io/docs/en/editors.html) to use `prettier` and to honor [editorconfig](https://editorconfig.org/) settings. + ## Maintenance ### Continuous integration (CI) Nextclade build and deployment process is automated using GitHub Actions: -- Nextclade Web build and deployment: [.github/workflows/web.yml](../../.github/workflows/web.yml) -- Nextclade CLI build and GitHub releases: [.github/workflows/cli.yml](../../.github/workflows/cli.yml) -- Nextclade CLI Bioconda release: [.github/workflows/bioconda.yml](../../.github/workflows/bioconda.yml) +- Nextclade Web build and deployment: [.github/workflows/web.yml](https://github.com/nextstrain/nextclade/blob/master/.github/workflows/web.yml) +- Nextclade CLI build and GitHub releases: [.github/workflows/cli.yml](https://github.com/nextstrain/nextclade/blob/master/.github/workflows/cli.yml) +- Nextclade CLI Bioconda release: [.github/workflows/bioconda.yml](https://github.com/nextstrain/nextclade/blob/master/.github/workflows/bioconda.yml) The workflows run on every pull request on GitHub and every push to a major branch. @@ -422,7 +462,7 @@ Here is a list of environments: | master | master.nextstrain.org | data.master.nextstrain.org | Main development branch - accumulates features and bug fixes from pull requests | | other branches | temporary domain on Vercel | branch with the same name in dataset GitHub repo if exists, otherwise data.master.nextstrain.org | Pull requests - development of new features and bug fixes | -Preview versions of Nextclade Web built from pull requests will first try to fetch data from GitHub, from the branch with the same name in the [dataset GitHub repository](https://github.com/nextstrain/nextclade_data), if such branch exists. If not, the it will fetch from `master` environment. This is useful during development, when you need to modify both software and data: if you have branches with the same name in both repos, Nextclade Web will fetch the datasets from that branch. +Preview versions of Nextclade Web built from pull requests will first try to fetch data from GitHub, from the branch with the same name in the [dataset GitHub repository](https://github.com/nextstrain/nextclade_data), if such branch exists. If not, then it will fetch from `master` environment. This is useful during development, when you need to modify both software and data: if you have branches with the same name in both repos, Nextclade Web will fetch the datasets from that branch. Nextclade CLI built from pull requests in Nextclade repository is always using `master` deployment. @@ -486,19 +526,19 @@ See Nextclade CLI user documentation for more details about available command in To provide Nextclade with the alternative location of the dataset server, add the `dataset-server` URL parameter with value set to URL of the custom dataset server: -```url +``` https://clades.nextstrain.org?dataset-server=http://example.com ``` Local URLs should also work: -```url +``` https://clades.nextstrain.org?dataset-server=http://localhost:3001 ``` Combining locally built Nextclade Web and local dataset server too: -```url +``` https://localhost:3000?dataset-server=http://localhost:3001 ``` diff --git a/docs/dev/index.rst b/docs/dev/index.rst new file mode 100644 index 000000000..421e23362 --- /dev/null +++ b/docs/dev/index.rst @@ -0,0 +1,12 @@ +================================================================================ +Developer documentation +================================================================================ + +Documentation for developers and maintainers of Nextclade + +.. toctree:: + :maxdepth: 3 + :titlesonly: + + developer-guide.md + macos.md diff --git a/docs/environment.yml b/docs/environment.yml index a28349574..87da2ca50 100644 --- a/docs/environment.yml +++ b/docs/environment.yml @@ -2,14 +2,15 @@ name: docs.clades.nextstrain.org channels: - defaults dependencies: + - linkify-it-py - make - - recommonmark + - myst-parser + - pip - requests - sphinx - - pip - pip: - nextstrain-sphinx-theme>=2022.5 - - sphinx-markdown-tables - sphinx-argparse - - sphinx-tabs - sphinx-autobuild + - sphinx-markdown-tables + - sphinx-tabs diff --git a/docs/index.rst b/docs/index.rst index a6beb3850..7352148b4 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -65,4 +65,6 @@ Nextclade is a part of `Nextstrain `_, an open-source pr user/faq user/terminology + ../dev/index + changes/index diff --git a/docs/user/algorithm/01-sequence-alignment.md b/docs/user/algorithm/01-sequence-alignment.md index 1e10fcea6..e759b94ef 100644 --- a/docs/user/algorithm/01-sequence-alignment.md +++ b/docs/user/algorithm/01-sequence-alignment.md @@ -2,7 +2,7 @@ In order for sequences to be analyzed, they need to be arranged in a way that allows for comparing homologous regions. This process is called [sequence alignment](https://en.wikipedia.org/wiki/Sequence_alignment). -Nextclade performs pairwise alignment of the provided (query) sequences against a given reference (root) sequence using a banded local alignment algorithm with affine gap-cost. The band width and rough relative positions of query and reference sequence are determined through seed matching. Seed matching consists of finding several small fragments, *seeds*, where the reference and query sequence match exactly. Nextclade finds these matches using an [FM-index](https://en.wikipedia.org/wiki/FM-index). To improve sensitivity, Nextclade searches for exact matches while ignoring every third base, that is a matching pattern like `XX.XX.XX.XX` where `X` is matched and `.` ignored. This pattern allows to ignore the majority of synonymous mutations that happen at the third position in codons. The number of seeds, as well as their length, spacing are configurable in [Nextclade CLI](../nextclade-cli). +Nextclade performs pairwise alignment of the provided (query) sequences against a given reference (root) sequence using a banded local alignment algorithm with affine gap-cost. The band width and rough relative positions of query and reference sequence are determined through seed matching. Seed matching consists of finding several small fragments, *seeds*, where the reference and query sequence match exactly. Nextclade finds these matches using an [FM-index](https://en.wikipedia.org/wiki/FM-index). To improve sensitivity, Nextclade searches for exact matches while ignoring every third base, that is a matching pattern like `XX.XX.XX.XX` where `X` is matched and `.` ignored. This pattern allows to ignore the majority of synonymous mutations that happen at the third position in codons. The number of seeds, as well as their length, spacing are configurable in [Nextclade CLI](../nextclade-cli/index.rst). Seed matches are then extended while allowing for a small number of mismatches in a sliding window (configurable) and pruned to an optimal chain of seeds in ascending order on query and reference sequences. If the resulting chain of seeds covers a sufficient fraction of the query sequence (configurable), the relative positions of these seeds are used to estimate the shift of the query sequence relative to the reference and the amount of insertion/deletions between successive seeds. @@ -14,7 +14,7 @@ To prevent Nextclade from running out of memory during the alignment process, th After alignment, Nextclade strips insertions relative to the reference from the aligned sequences and lists them in a separate file. As a result, each sequence is reported in coordinates of the reference sequence. -The algorithm aims to be sufficiently fast for running in the internet browser of an average consumer computer, by trading width of the alignment band for improved runtime performance. We found that it works well for most sequences, but for a minority of sequences indel variation not captured by seed matches might result in sub-optimal alignments. +The algorithm aims to be sufficiently fast for running in the internet browser of an average consumer computer, by trading width of the alignment band for improved runtime performance. We found that it works well for most sequences, but for a minority of sequences indel variation not captured by seed matches might result in suboptimal alignments. By default, alignment is only attempted on sequences longer than 100 nucleotides (configurable), because alignment of shorter sequences may be unreliable. If alignment fails, Nextclade will optionally attempt to align the reverse complemented sequence. @@ -32,10 +32,10 @@ If a genome annotation is provided, Nextclade will use a lower gap-open-penalty Alignment may fail if the query sequence is too divergent from the reference sequence, i.e. if there are many differences between the query and reference sequence. The seed matching step may then not be able to find a sufficient number of similar regions. This may happen due to usage of an incorrect reference sequence (e.g. from a different virus or a virus from a different host organism), if analysed sequences are of very low quality (e.g. containing a lot of missing regions or with a lot of ambiguous nucleotides) or are very short compared to the reference sequence. -> ⚠️ Analysis steps that follow the step alignment will ignore sequence regions before and after the alignment range, as well as unsequenced regions (consecutive gap (`-`) character ranges on the 5' and 3' ends). The exact alignment range is indicated as "Alignment range" in the analysis results table of [Nextclade Web](../nextclade-web) and `alignmentStart` and `alignmentEnd` in the output files of [Nextclade Web](../nextclade-web) and [Nextclade CLI](../nextclade-cli). +> ⚠️ Analysis steps that follow the step alignment will ignore sequence regions before and after the alignment range, as well as unsequenced regions (consecutive gap (`-`) character ranges on the 5' and 3' ends). The exact alignment range is indicated as "Alignment range" in the analysis results table of [Nextclade Web](../nextclade-web/index.rst) and `alignmentStart` and `alignmentEnd` in the output files of [Nextclade Web](../nextclade-web/index.rst) and [Nextclade CLI](../nextclade-cli/index.rst). ### Results The alignment step results in aligned nucleotide sequences, which are being produced in the form of a fasta files. -This file is written by [Nextclade CLI](../nextclade-cli) and can be downloaded in the "Download" dialog of [Nextclade Web](../nextclade-web). +This file is written by [Nextclade CLI](../nextclade-cli/index.rst) and can be downloaded in the "Download" dialog of [Nextclade Web](../nextclade-web/index.rst). diff --git a/docs/user/algorithm/02-translation.md b/docs/user/algorithm/02-translation.md index a65e95660..60ce6b858 100644 --- a/docs/user/algorithm/02-translation.md +++ b/docs/user/algorithm/02-translation.md @@ -1,8 +1,8 @@ # 2. Translation -In order to detect changes in viral proteins, amino acid sequences (peptides) need to be computed from the nucleotide sequence regions corresponding to [coding sequences (CDS)](https://en.wikipedia.org/wiki/Coding_region). This process is called [translation](). Peptide sequences then need to be aligned, in order to make them comparable, similarly to how it's [done](./01-sequence-alignment) with nucleotide sequences. +In order to detect changes in viral proteins, amino acid sequences (peptides) need to be computed from the nucleotide sequence regions corresponding to [coding sequences (CDS)](https://en.wikipedia.org/wiki/Coding_region). This process is called [translation](). Peptide sequences then need to be aligned, in order to make them comparable, similarly to how it's [done](./01-sequence-alignment.md) with nucleotide sequences. -Nextclade performs translation separately for every CDS. CDS are specified in a [genome annotation file](../input-files/03-genome-annotation.md), previously called [Gene map](../terminology.html#gene-map), and can consist of multiple segments that correspond to ranges in the genome that are combined into a contiguous CDS. The list of CDS to be considered for translation is configurable in [Nextclade CLI](../nextclade-cli) and if it's not specified, all CDS found in the annotation are translated. +Nextclade performs translation separately for every CDS. CDS are specified in a [genome annotation file](../input-files/03-genome-annotation.md), previously called [Gene map](../terminology.md#gene-map), and can consist of multiple segments that correspond to ranges in the genome that are combined into a contiguous CDS. The list of CDS to be considered for translation is configurable in [Nextclade CLI](../nextclade-cli/index.rst) and if it's not specified, all CDS found in the annotation are translated. For each coding sequence in the annotation, Nextclade extracts the corresponding sequence from the nucleotide alignment, and then generates peptides by taking every triplet of nucleotides (codon) and translating it into a corresponding amino acid. It then aligns the resulting peptides against the corresponding reference peptides (translated from reference sequence), using the same alignment algorithm as for nucleotide sequences. @@ -10,6 +10,6 @@ This step only runs if an annotation is provided. ### Results -The translation step results in aligned [peptide](../terminology.html#peptide) sequences, which are being [produced](../output-files/03-translations) in the form of fasta files, one file per CDS. +The translation step results in aligned [peptide](../terminology.md#peptide) sequences, which are being [produced](../output-files/03-translations.md) in the form of fasta files, one file per CDS. -These files are written by [Nextclade CLI](../nextclade-cli) and can be downloaded in the "Export" dialog of [Nextclade Web](../nextclade-web). +These files are written by [Nextclade CLI](../nextclade-cli/index.rst) and can be downloaded in the "Export" dialog of [Nextclade Web](../nextclade-web/index.rst). diff --git a/docs/user/algorithm/03-phylogenetic-placement.md b/docs/user/algorithm/03-phylogenetic-placement.md index 24d7427a9..37ff31959 100644 --- a/docs/user/algorithm/03-phylogenetic-placement.md +++ b/docs/user/algorithm/03-phylogenetic-placement.md @@ -5,47 +5,40 @@ After reference alignment and mutation calling, Nextclade places each query sequ > 💡 Learn more about phylogenetic trees: > - [Nextstrain narrative: How to interpret phylogenetic trees](https://nextstrain.org/narratives/trees-background) > - [Wikipedia: Phylogenetic tree](https://en.wikipedia.org/wiki/Phylogenetic_tree) -> - > ⚠️ The root of the input phylogenetic tree **must** correspond to the input reference (root) sequence. If the reference sequence differs from the sequence of the root node, the differences between the two have to be added as mutations ancestral to the root node. Nextclade will error when inconsistencies are between diversity in the tree and the reference sequence are encountered. -> Phylogenetic placement is done by comparing the mutations of the query sequence (relative to the reference) with the mutations of every node and tip in the reference tree, and finding the node which has the most similar set of mutations. In order to find the nearest reference node, the empirically chosen **distance metric** is calculated between each query sequence and reference node. It is defined as follows: -```math - -D = M_{ref} + M_{query} - 2 M_{agree} - M_{disagree} - M_{unknown} - -``` +$$D = M_{ref} + M_{query} - 2 M_{agree} - M_{disagree} - M_{unknown}$$ where -- ``$ D $`` is the resulting distance metric +- $D$ is the resulting distance metric -- ``$ M_{ref} $`` is the total number of mutations in the reference node +- $M_{ref}$ is the total number of mutations in the reference node -- ``$ M_{query} $`` is the total number of mutations in the query sequence +- $M_{query}$ is the total number of mutations in the query sequence -- ``$ M_{agree} $`` is the number of exact mutations is shared between the reference node and the query sequence +- $M_{agree}$ is the number of exact mutations shared between the reference node and the query sequence -- ``$ M_{disagree} $`` is the number of mutations at the same position in the reference node and the query sequence, but where the states are different. This is where the reference node and the query sequence disagree +- $M_{disagree}$ is the number of mutations at the same position in the reference node and the query sequence, but where the states are different. This is where the reference node and the query sequence disagree -- ``$ M_{unknown} $`` is number of undetermined (sites) - sites that are mutated in the reference node but are missing in the query sequence. For these we can't tell whether the reference node agrees with the query sequence +- $M_{unknown}$ is number of undetermined - sites that are mutated in the reference node but are missing in the query sequence. For these we can't tell whether the reference node agrees with the query sequence -The nearest reference node is then chosen as the one having the lowest distance metric ``$ D $``. +The nearest reference node is then chosen as the one having the lowest distance metric $D$. If multiple candidate attachment nodes with the same distance exist, Nextclade can use a "placement prior" to pick the most likely node based on its prevalence in the overall sequence data. Note that this option exists only when such placement information is coded into the reference tree of the dataset. This operation is repeated for each query sequence, until all of them are placed onto the tree. -Other query sequences are never considered as targets for the initial placement such that information derived from the placement on the reference tree (see for example [clade assignment](06-clade-assignment)) does not depend on other query sequences. Note, however, that Nextclade now supports a greedy type of tree-building performed at the final step of the analysis that will consider relation-ships between query sequences (see [tree building](#tree-building)). +Other query sequences are never considered as targets for the initial placement such that information derived from the placement on the reference tree (see for example [clade assignment](04-clade-assignment.md)) does not depend on other query sequences. Note, however, that Nextclade now supports a greedy type of tree-building performed at the final step of the analysis that will consider relation-ships between query sequences (see [tree building](#tree-building)). Mutations that separate the query sequence and the nearest node in the reference tree are designated "private mutations". Mutations that are the same is the query sequence and in the nearest node we call "shared mutations". -Sequencing errors and sequence assembly problems are expected to give rise to more private mutations than usual. Thus, an excess of such mutations is a useful [quality control (QC) metric](07-quality-control.md). In addition to the overall number of such private mutations, Nextclade also assesses whether they cluster in specific regions of the genome, as such clusters give more fine-grained indications of potential quality issues. +Sequencing errors and sequence assembly problems are expected to give rise to more private mutations than usual. Thus, an excess of such mutations is a useful [quality control (QC) metric](06-quality-control.md). In addition to the overall number of such private mutations, Nextclade also assesses whether they cluster in specific regions of the genome, as such clusters give more fine-grained indications of potential quality issues. ### Tree building @@ -71,14 +64,14 @@ The following limitations are inherent to this approach (compared to the Nextstr - While a tree can be built from scratch in principle, a high-quality input reference tree with representative samples is necessary for reliable resolution of phylogenetic structure. -- The clade information is taken from the initial placement on the reference tree (see [Clade assignment](06-clade-assignment) section). Only clades that are present in the input reference tree can be assigned to the new nodes and modifications to the tree structure by tree-refinement are not represented in the initial clade assignment. +- The clade information is taken from the initial placement on the reference tree (see [Clade assignment](04-clade-assignment.md) section). Only clades that are present in the input reference tree can be assigned to the new nodes and modifications to the tree structure by tree-refinement are not represented in the initial clade assignment. ### Results Phylogenetic placement results in creation of a new tree, which is identical to the input tree, but with the new nodes (corresponding to the query sequences) added to it. -It can be viewed on the "Tree" page on [Nextclade Web](../nextclade-web). +It can be viewed on the "Tree" page on [Nextclade Web](../nextclade-web/index.rst). -It is being output as a file by [Nextclade CLI](../nextclade-cli) and can be obtained in the "Download" dialog of [Nextclade Web](../nextclade-web) in Auspice's JSON format or as Newick string. +It is being output as a file by [Nextclade CLI](../nextclade-cli/index.rst) and can be obtained in the "Download" dialog of [Nextclade Web](../nextclade-web/index.rst) in Auspice's JSON format or as Newick string. The tree file can be also viewed by dropping it to [auspice.us](https://auspice.us). diff --git a/docs/user/algorithm/04-clade-assignment.md b/docs/user/algorithm/04-clade-assignment.md index ba244e7a2..d188e3ed8 100644 --- a/docs/user/algorithm/04-clade-assignment.md +++ b/docs/user/algorithm/04-clade-assignment.md @@ -1,8 +1,8 @@ # 4. Clade assignment -To simplify discussion of co-circulating virus variants, viral diversity of is often broken down into [Clades](../terminology.html#clade) or lineages which are defined by specific combinations of signature mutations. Clades are groups of related sequences that share a common ancestor. For SARS-CoV-2, Nextclade can assign both broad clades defined by the Nextstrain team as well as more fine-grained lineages defined by the PANGO consortium. +To simplify discussion of co-circulating virus variants, viral diversity of is often broken down into [Clades](../terminology.md#clade) or lineages which are defined by specific combinations of signature mutations. Clades are groups of related sequences that share a common ancestor. For SARS-CoV-2, Nextclade can assign both broad clades defined by the Nextstrain team as well as more fine-grained lineages defined by the PANGO consortium. -Instead of directly using mutational signatures to assign clades, Nextclade assigns your sequences to clades by placing sequences on a phylogenetic tree annotated with clade definitions. More specifically, Nextclade assigns the clade of the nearest reference node found during the [Phylogenetic placement](05-phylogenetic-placement) step. +Instead of directly using mutational signatures to assign clades, Nextclade assigns your sequences to clades by placing sequences on a phylogenetic tree annotated with clade definitions. More specifically, Nextclade assigns the clade of the nearest reference node found during the [Phylogenetic placement](03-phylogenetic-placement.md) step. > ⚠️ Nextclade only considers those clades which are present in the input reference tree. Only one of these clades, and no others, can be assigned to the analyzed sequences. It is important to make sure that every clade that you expect to find in the results is well represented in the tree. > @@ -39,6 +39,6 @@ To keep the reference tree small, Nextclade does not include all early `Pango li ## Results -Clades are reported in the "Clade" column in the results table of [Nextclade Web](../nextclade-web) as well as in the analysis results JSON, CSV and TSV files generated by [Nextclade CLI](../nextclade-cli) and in the "Download" dialog of [Nextclade Web](../nextclade-web). +Clades are reported in the "Clade" column in the results table of [Nextclade Web](../nextclade-web/index.rst) as well as in the analysis results JSON, CSV and TSV files generated by [Nextclade CLI](../nextclade-cli/index.rst) and in the "Download" dialog of [Nextclade Web](../nextclade-web/index.rst). For SARS-CoV-2, Pango lineages are also displayed in the results. In `tsv` and `csv` files, the column is named `Nextclade_pango`. diff --git a/docs/user/algorithm/05-mutation-calling.md b/docs/user/algorithm/05-mutation-calling.md index e799cd65f..158f8a8ac 100644 --- a/docs/user/algorithm/05-mutation-calling.md +++ b/docs/user/algorithm/05-mutation-calling.md @@ -6,15 +6,15 @@ Nextclade calls nucleotide and aminoacid mutations relative to multiple targets. In order to detect nucleotide mutations, aligned nucleotide sequences are compared with the reference nucleotide sequence, one nucleotide at a time. Mismatches between the query and reference sequences are then noted and reported differently, depending on their nature: -- Nucleotide substitutions: a change from one character to another. For example a change from `A` in the reference sequence to `G` in the query sequence. They are shown in sequence views in [Nextclade Web](../nextclade-web) as colored markers, where color signifies the resulting character (in query sequence). +- Nucleotide substitutions: a change from one character to another. For example a change from `A` in the reference sequence to `G` in the query sequence. They are shown in sequence views in [Nextclade Web](../nextclade-web/index.rst) as colored markers, where color signifies the resulting character (in query sequence). -- Nucleotide deletions ("gaps"): nucleotide was present in the reference sequence, but is not present in the query sequence. These are indicated by the "`-`" character in the alignment sequence. They are shown in sequence views in [Nextclade Web](../nextclade-web) as dark-grey markers. In the output files deletions are represented as numeric ranges, signifying the start and end of the deleted fragment (for example: `21765-21770`) +- Nucleotide deletions ("gaps"): nucleotide was present in the reference sequence, but is not present in the query sequence. These are indicated by the "`-`" character in the alignment sequence. They are shown in sequence views in [Nextclade Web](../nextclade-web/index.rst) as dark-grey markers. In the output files deletions are represented as numeric ranges, signifying the start and end of the deleted fragment (for example: `21765-21770`) - Nucleotide insertions: additional nucleotides in the query sequence that were not present in the reference sequence. They are stripped from the alignment and reported separately, showing the position in the reference after which the insertion occurred and the fragment that was inserted. `22030:ACT` would indicate that the query sequence has the three bases `ACT` inserted between position `22030` and `22031` in the reference sequence (the indices are 1-based). -Nextclade also gathers and reports other useful statistics, such as the number of contiguous ranges of `N` (missing) and non-ACGTN (ambiguous) nucleotides, as well as the total counts of substituted, deleted, missing and ambiguous nucleotides. You can find this information in the results table of [Nextclade Web](../nextclade-web) and in the output files of [Nextclade CLI](../nextclade-cli). +Nextclade also gathers and reports other useful statistics, such as the number of contiguous ranges of `N` (missing) and non-ACGTN (ambiguous) nucleotides, as well as the total counts of substituted, deleted, missing and ambiguous nucleotides. You can find this information in the results table of [Nextclade Web](../nextclade-web/index.rst) and in the output files of [Nextclade CLI](../nextclade-cli/index.rst). -Similarly, aminoacid mutations and statistics are gathered from the aligned peptides obtained after [translation](./02-translation). This step only runs if a [genome annotation](../input-files/03-genome-annotation) is provided. +Similarly, aminoacid mutations and statistics are gathered from the aligned peptides obtained after [translation](./02-translation.md). This step only runs if a [genome annotation](../input-files/03-genome-annotation.md) is provided. ### Private mutations @@ -68,7 +68,7 @@ This could be useful, for example, for comparing sequences to the vaccine strain The mutation calling step results in a set of mutations and various practical metrics for each sequence. -Mutations can be viewed in the last column of the results table in [Nextclade Web](../nextclade-web). +Mutations can be viewed in the last column of the results table in [Nextclade Web](../nextclade-web/index.rst). The "Genetic feature" dropdown allows switching between nucleotide sequence and CDSes (if genome annotation is provided). The "Relative to" dropdown allows to select the target for comparison: @@ -80,4 +80,4 @@ The "Genetic feature" dropdown allows switching between nucleotide sequence and The "Mut" column shows total number of nucleotide mutations and its mouseover tooltip lists the mutations. -All results are emitted into the output [JSON](../output-files/05-results-json), [CSV and TSV files](../output-files/04-results-tsv) in [Nextclade CLI](../nextclade-cli) and in the "Export" dialog of [Nextclade Web](../nextclade-web). +All results are emitted into the output [JSON](../output-files/05-results-json.md), [CSV and TSV files](../output-files/04-results-tsv.md) in [Nextclade CLI](../nextclade-cli/index.rst) and in the "Export" dialog of [Nextclade Web](../nextclade-web/index.rst). diff --git a/docs/user/algorithm/06-quality-control.md b/docs/user/algorithm/06-quality-control.md index 0ab5891dd..56d0abb60 100644 --- a/docs/user/algorithm/06-quality-control.md +++ b/docs/user/algorithm/06-quality-control.md @@ -16,21 +16,19 @@ For each query sequence each individual QC rule produces a quality score. These | 30 to 99 | "mediocre" quality | yellow | | 100 and above | "bad" quality | red | -After all scores are calculated, the **final QC score** ``$` S `$`` is calculated as follows: +After all scores are calculated, the **final QC score** $S$ is calculated as follows: -```math -S = \sum_i \frac{S_i^2}{100} -``` +$$S = \sum_i \frac{S_i^2}{100}$$ -where ``$` S_i `$`` is the score for an individual QC rule ``$` i `$``. +where $S_i$ is the score for an individual QC rule $i$. With this quadratic aggregation, multiple mildly concerning scores don't result in a bad overall score, but a single bad score guarantees a bad overall score. -The final score has the same thresholds as the the individual scores. +The final score has the same thresholds as the individual scores. ## Individual QC Rules -For SARS-CoV-2, we currently implement the following QC rules (in parentheses are the one-letter designations used in [Nextclade Web](../nextclade-web)). For other viruses, such as influenza, a subset of the QC rules are used and the parametrization is adjusted. The exact parameters can be found in the `pathogen.json` input file. Datasets provided by Nextclade can be inspected in the GitHub repo [nextstrain/nextclade_data](https://github.com/nextstrain/nextclade_data). +For SARS-CoV-2, we currently implement the following QC rules (in parentheses are the one-letter designations used in [Nextclade Web](../nextclade-web/index.rst)). For other viruses, such as influenza, a subset of the QC rules are used and the parametrization is adjusted. The exact parameters can be found in the `pathogen.json` input file. Datasets provided by Nextclade can be inspected in the GitHub repo [nextstrain/nextclade_data](https://github.com/nextstrain/nextclade_data). Parameter values stated below refer to SARS-CoV-2. ### Missing data (N) @@ -43,12 +41,12 @@ Ambiguous nucleotides (such as `R`, `Y`, etc) are often indicative of contaminat ### Private mutations (P) -[Private mutations](05-mutation-calling.md#private-mutations) may indicate sequencing errors or unusual variants. +[Private mutations](./05-mutation-calling.md#private-mutations) may indicate sequencing errors or unusual variants. ### Mutation clusters (C) To be more sensitive for quality problems in a narrow area of a genome, the mutation cluster rule counts the number of private within all possible 100-nucleotide windows (`windowSize`). -If that number exceeds 6 (`clusterCutOff`), this counts as a SNP cluster. +If that number exceeds 6 (`clusterCutOff`), this counts as an SNP cluster. The quality score is the number of clusters times 50 (`scoreWeight`), hence 1 cluster will cause the cluster rule to be mediocre. ### Stop codons (S) @@ -75,10 +73,10 @@ Note that there are many additional potential problems Nextclade does not check ## Configuration -QC checks can be enabled or disabled, and their parameters can be changed by modifying `qc` field in the `pathogen.json` file in the Advanced mode of [Nextclade Web](../nextclade-web) or in [Nextclade CLI](../nextclade-cli). +QC checks can be enabled or disabled, and their parameters can be changed by modifying `qc` field in the `pathogen.json` file in the Advanced mode of [Nextclade Web](../nextclade-web/index.rst) or in [Nextclade CLI](../nextclade-cli/index.rst). ## Results -QC results are presented in the "QC" column of the results table in [Nextclade Web](../nextclade-web). More information is included into mouseover tooltips. +QC results are presented in the "QC" column of the results table in [Nextclade Web](../nextclade-web/index.rst). More information is included into mouseover tooltips. -QC results are also included in the analysis results JSON, CSV and TSV files generated by [Nextclade CLI](../nextclade-cli) and in the "Download" dialog of [Nextclade Web](../nextclade-web). +QC results are also included in the analysis results JSON, CSV and TSV files generated by [Nextclade CLI](../nextclade-cli/index.rst) and in the "Download" dialog of [Nextclade Web](../nextclade-web/index.rst). diff --git a/docs/user/algorithm/07-pcr-primer-changes-detection.md b/docs/user/algorithm/07-pcr-primer-changes-detection.md index b62f6b334..b9517cc53 100644 --- a/docs/user/algorithm/07-pcr-primer-changes-detection.md +++ b/docs/user/algorithm/07-pcr-primer-changes-detection.md @@ -8,6 +8,6 @@ This step only runs if a PCR primer table is provided. PCR primers are specific ### Results -PCR primer changes are reported in the tooltip of the "Mut." (short for "Mutations") column in the results table in [Nextclade Web](../nextclade-web). +PCR primer changes are reported in the tooltip of the "Mut." (short for "Mutations") column in the results table in [Nextclade Web](../nextclade-web/index.rst). -They are a included into the analysis results JSON, CSV and TSV files generated by [Nextclade CLI](../nextclade-cli) and in the "Download" dialog of [Nextclade Web](../nextclade-web). +They are a included into the analysis results JSON, CSV and TSV files generated by [Nextclade CLI](../nextclade-cli/index.rst) and in the "Download" dialog of [Nextclade Web](../nextclade-web/index.rst). diff --git a/docs/user/algorithm/index.rst b/docs/user/algorithm/index.rst index 4eaa0b451..457294d5f 100644 --- a/docs/user/algorithm/index.rst +++ b/docs/user/algorithm/index.rst @@ -15,3 +15,4 @@ Internally, Nextclade is implemented as a parallel pipeline which consists of se 05-mutation-calling.md 06-quality-control.md 07-pcr-primer-changes-detection.md + nextclade-pango.md diff --git a/docs/user/datasets.md b/docs/user/datasets.md index c3fdf6c95..6ac79e594 100644 --- a/docs/user/datasets.md +++ b/docs/user/datasets.md @@ -26,7 +26,7 @@ Optionally, a dataset can contain the following additional files: - a changelog file describing changes between versions (`CHANGELOG.md`) - example sequence data for testing and demonstration (`sequences.fasta`) -For in-depth documentation of the input files, see: [Input files](input-files) +For in-depth documentation of the input files, see: [Input files](input-files/index.rst) An instance of a dataset is a directory containing the dataset files or an equivalent zip archive. diff --git a/docs/user/input-files/03-genome-annotation.md b/docs/user/input-files/03-genome-annotation.md index 199f7c37c..3ff07c86b 100644 --- a/docs/user/input-files/03-genome-annotation.md +++ b/docs/user/input-files/03-genome-annotation.md @@ -2,7 +2,7 @@ A tab separated table describing the genes of the virus (name, frame, position, etc.) -The annotation is required for codon-aware alignment, for translation of CDS (CoDing Sequences), and for calling of amino acid mutations. Without annotation (sometimes called genemap), peptide sequences will not be output and amino acid mutations will not be detected. Without annotation the nucleotide alignment step will not be informed by codon information (see: [Algorithm: Sequence alignment](../algorithm/01-sequence-alignment) and [Algorithm: Translation](../algorithm/02-translation)). +The annotation is required for codon-aware alignment, for translation of CDS (CoDing Sequences), and for calling of amino acid mutations. Without annotation (sometimes called genemap), peptide sequences will not be output and amino acid mutations will not be detected. Without annotation the nucleotide alignment step will not be informed by codon information (see: [Algorithm: Sequence alignment](../algorithm/01-sequence-alignment.md) and [Algorithm: Translation](../algorithm/02-translation.md)). Accepted formats: [GFF3](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3%2Emd). @@ -22,7 +22,7 @@ When a linked `gene` and `CDS` are present (`CDS`s specify their parents by list Example gene map for SARS-CoV-2: -```tsv +``` # seqname source feature start end score strand frame attribute . . gene 266 21555 . + . gene=ORF1ab;ID=gene-ORF1ab . . CDS 266 13468 . + . gene=ORF1ab;ID=cds-ORF1ab;Parent=gene-ORF1ab @@ -49,4 +49,4 @@ Note: For historical reasons, Nextclade uses _gene name_ when it really means _C It is recommended that the `gene` attribute is used to specify the gene/CDS name. -> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression) for more details. +> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression.md) for more details. diff --git a/docs/user/input-files/04-reference-tree.md b/docs/user/input-files/04-reference-tree.md index 12d56e665..9912b9847 100644 --- a/docs/user/input-files/04-reference-tree.md +++ b/docs/user/input-files/04-reference-tree.md @@ -6,13 +6,13 @@ Nextclade CLI argument: `--input-tree`/`-a` Accepted formats: Auspice JSON v2 ([description](https://nextstrain.org/docs/bioinformatics/data-formats), [schema](https://github.com/nextstrain/augur/blob/master/augur/data/schema-export-v2.json)) - this is the same format that is used in Nextstrain. It is produced by [augur export](https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/export.html) and consumed by [Nextstrain Auspice](https://docs.nextstrain.org/projects/auspice/en/stable/). Refer to Nextstrain documentation at [https://docs.nextstrain.org](https://docs.nextstrain.org) and in particular the [`augur` documentation](https://docs.nextstrain.org/projects/augur/en/stable/index.html) on how to build your own trees. Using `augur` to make the reference tree is not a strict requirement, however the output tree must follow the `Auspice JSON v2` schema. -The phylogenetic reference tree which serves as a target for phylogenetic placement (see [Algorithm: Phylogenetic placement](../algorithm/05-phylogenetic-placement)). Nearest neighbor information is used to assign clades (see [Algorithm: Clade Assignment](../algorithm/06-clade-assignment)) and to identify private mutations, including reversions. +The phylogenetic reference tree which serves as a target for phylogenetic placement (see [Algorithm: Phylogenetic placement](../algorithm/03-phylogenetic-placement.md)). Nearest neighbor information is used to assign clades (see [Algorithm: Clade Assignment](../algorithm/04-clade-assignment.md)) and to identify private mutations, including reversions. The tree **must** be rooted at the sample that matches the [reference sequence](../terminology.md#reference-sequence). A workaround in case one does not want to root the tree to be rooted on the reference is to attach the mutational differences between the tree root and the reference on the branch leading to the root node. This can be accomplished by passing the reference sequence to `augur ancestral`'s `--root-sequence` argument (see the [`augur ancestral` docs](https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/ancestral.html#inputs)). The tree **must** contain a clade definition for every node (including internal): every node must have a value at `node_attrs.clade_membership` (although it can be an empty string). -The tree **should** be sufficiently large and diverse to meet clade assignment expectations of a particular use-case, study or experiment. Only clades present on the reference tree can be assigned to [query sequences](../terminology.html#query-sequence). +The tree **should** be sufficiently large and diverse to meet clade assignment expectations of a particular use-case, study or experiment. Only clades present on the reference tree can be assigned to [query sequences](../terminology.md#query-sequence). > 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression) for more details. diff --git a/docs/user/input-files/05-pathogen-config.md b/docs/user/input-files/05-pathogen-config.md index 18569461e..c9e0f02e4 100644 --- a/docs/user/input-files/05-pathogen-config.md +++ b/docs/user/input-files/05-pathogen-config.md @@ -32,7 +32,7 @@ Example: } ``` -See [Input files](../input-files) section for more details. +See [Input files](../input-files/index.rst) section for more details. ### Optional @@ -54,7 +54,7 @@ Example: #### `qc` -Optional. Quality control (QC) configuration. If not provided, Nextclade does not do any QC checks. Details of the QC algorithms and their parameters are described in [Algorithm: Quality control](../algorithm/07-quality-control). +Optional. Quality control (QC) configuration. If not provided, Nextclade does not do any QC checks. Details of the QC algorithms and their parameters are described in [Algorithm: Quality control](../algorithm/06-quality-control.md). > ⚠️ Positions in the input files are 0-indexed and ranges are semi-open (ends are excluded). So `ORF3a:257-276` should be encoded as `{"begin": 256, "end": 276 }`. @@ -179,4 +179,4 @@ TODO TODO -> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression) for more details. +> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression.md) for more details. diff --git a/docs/user/migration-v3.md b/docs/user/migration-v3.md index 696698417..b2dabe00e 100644 --- a/docs/user/migration-v3.md +++ b/docs/user/migration-v3.md @@ -104,7 +104,7 @@ It might be that it does not require parameter tuning anymore. If you observe se Nextclade v3 now has the ability to phylogenetically resolve relationships between input sequences, where v2 would only attach sequences to the reference tree. Nextclade v3 thus may produce trees that are different from the trees produced in Nextclade v2. -Please read the [Phylogenetic placement](algorithm/05-phylogenetic-placement) section in the documentation for more details. +Please read the [Phylogenetic placement: Tree building](algorithm/03-phylogenetic-placement.md#tree-building) section in the documentation for more details. ##### Migration paths diff --git a/docs/user/nextclade-cli/usage.md b/docs/user/nextclade-cli/usage.md index a43998e5d..e809af4b8 100644 --- a/docs/user/nextclade-cli/usage.md +++ b/docs/user/nextclade-cli/usage.md @@ -1,6 +1,6 @@ # Usage -> This section assumes you've installed Nextclade CLI, it's available in your system path as `nextclade` and has executable permissions. If not, please refer to [installation](installation) section for more information. +> This section assumes you've installed Nextclade CLI, it's available in your system path as `nextclade` and has executable permissions. If not, please refer to [installation](installation/index.rst) section for more information. Refer to the help prompt for usage of Nextclade by running it without any arguments or with `--help`: @@ -29,7 +29,7 @@ nextclade dataset get --help Observe downloaded dataset files in the directory `data/sars-cov-2/` - > 💡️ This command will download the latest SARS-CoV-2 dataset. You should run it periodically to update the dataset, in order to get the latest features, including the most up-to-date clade assignment. Find out more in the [Nextclade datasets](../datasets) section. + > 💡️ This command will download the latest SARS-CoV-2 dataset. You should run it periodically to update the dataset, in order to get the latest features, including the most up-to-date clade assignment. Find out more in the [Nextclade datasets](../datasets.md) section. 2. Run using the downloaded dataset and its example sequences (`data/sars-cov-2/sequences.fasta`): @@ -76,25 +76,25 @@ nextclade dataset get --help There are more advanced arguments to control alignment and other parts of the algorithm. Refer to `nextclade run --help` for more details. - You can learn more about input and output files in sections: [Input files](../input-files), [Output files](../output-files) and [Nextclade datasets](../datasets). Read the built-in help (`nextclade --help`) for a detailed description of each subcommand and each flag. + You can learn more about input and output files in sections: [Input files](../input-files/index.rst), [Output files](../output-files/index.rst) and [Nextclade datasets](../datasets.md). Read the built-in help (`nextclade --help`) for a detailed description of each subcommand and each flag. 3. Find the output files in the `output/` directory: - - `nextclade.aligned.fasta` - aligned input sequences - - `nextclade_cds_.translation.fasta` - aligned peptides corresponding to each coding sequence (CDS) - - `nextclade.tsv` - results of the analysis in TSV format - - `nextclade.csv` - same results, but in CSV format - - `nextclade.json` - detailed results of the analysis in JSON format - - `nextclade.ndjson` - detailed results of the analysis in newline-delimited JSON format - - `nextclade.auspice.json` - same as input tree, but with the input sequences placed onto it and in Auspice v2 JSON format - - `nextclade.tree.nwk` - same as input tree, but with the input sequences placed onto it and in Newick format + - `nextclade.aligned.fasta` - aligned input sequences + - `nextclade_cds_.translation.fasta` - aligned peptides corresponding to each coding sequence (CDS) + - `nextclade.tsv` - results of the analysis in TSV format + - `nextclade.csv` - same results, but in CSV format + - `nextclade.json` - detailed results of the analysis in JSON format + - `nextclade.ndjson` - detailed results of the analysis in newline-delimited JSON format + - `nextclade.auspice.json` - same as input tree, but with the input sequences placed onto it and in Auspice v2 JSON format + - `nextclade.tree.nwk` - same as input tree, but with the input sequences placed onto it and in Newick format ## What's next? Congratulations, You have learned how to use Nextclade CLI! -Going further, you might want to learn about the science behind the Nextclade internals in the [Algorithm](../algorithm) section. The required input data is described in [Input files](../input-files) section. And produced files are described in [Output files](../output-files) section. The datasets are described in more details in the [Nextclade datasets](../datasets) section. +Going further, you might want to learn about the science behind the Nextclade internals in the [Algorithm](../algorithm/index.rst) section. The required input data is described in [Input files](../input-files/index.rst) section. And produced files are described in [Output files](../output-files/index.rst) section. The datasets are described in more details in the [Nextclade datasets](../datasets.md) section. -For a more convenient online tool, check out [Nextclade Web](../nextclade-web). +For a more convenient online tool, check out [Nextclade Web](../nextclade-web/index.rst). Nextclade is an open-source project. We welcome ideas and contributions. Head to our [GitHub repository](https://github.com/nextstrain/nextclade) if you want report a bug, suggest a feature, or contribute code. diff --git a/docs/user/nextclade-web/analysis-results-table.md b/docs/user/nextclade-web/analysis-results-table.md index e331600cd..3f265427c 100644 --- a/docs/user/nextclade-web/analysis-results-table.md +++ b/docs/user/nextclade-web/analysis-results-table.md @@ -7,13 +7,13 @@ Nextclade analyzes your sequences locally in your browser. Sequences never leave The analysis pipeline comprises the following steps: 1. Sequence alignment: Sequences are aligned to the reference genome using a banded Waterman-Smith sequence alignment algorithm. -1. Translation: Coding nucleotide segments are extracted and translated to amino acid sequences. -1. Mutation calling: Nucleotide and amino acid changes relative to the reference are identified -1. Phylogenetic placement: Sequences are placed on a reference tree, private mutations are identified -1. Clade assignment: Clades are inferred from the place the sequence attached on the reference tree -1. Quality Control (QC): Quality control metrics are calculated +2. Translation: Coding nucleotide segments are extracted and translated to amino acid sequences. +3. Mutation calling: Nucleotide and amino acid changes relative to the reference are identified +4. Phylogenetic placement: Sequences are placed on a reference tree, private mutations are identified +5. Clade assignment: Clades are inferred from the place the sequence attached on the reference tree +6. Quality Control (QC): Quality control metrics are calculated -See the [Algorithm](algorithm) section of these docs for more details. +See the [Algorithm](../algorithm/index.rst) section of these docs for more details. You can get a quick overview of the results screen in the screenshot below: ![Results overview](../assets/web_overview.png) @@ -24,7 +24,7 @@ Nextclade implements a variety of quality control metrics to quickly spot proble ![QC hover](../assets/web_QC.png) -Every icon corresponds to a different metric. See [Quality control](algorithm/07-quality-control) section for the detailed explanation of QC metrics. +Every icon corresponds to a different metric. See [Quality control](../algorithm/06-quality-control.md) section for the detailed explanation of QC metrics. > Bear in mind that QC metrics are heuristics and that good quality sequences can occasionally fail some of the metrics (e.g. due to recombination or absence of close relatives in the reference tree). @@ -72,6 +72,6 @@ In sequence view, one can observe mutations in a particular gene. One of Nextcla ### Next steps -After the analysis is complete, you can view the phylogenetic tree with your sequences placed on it: See [Phylogenetic tree view](phylogenetic-tree-view) for details. +After the analysis is complete, you can view the phylogenetic tree with your sequences placed on it: See [Phylogenetic tree view](phylogenetic-tree-view.md) for details. You can also download the analysis results in a variety of formats: See [Export](export) for details. diff --git a/docs/user/nextclade-web/export.md b/docs/user/nextclade-web/export.md index 97452b1e2..d20151f52 100644 --- a/docs/user/nextclade-web/export.md +++ b/docs/user/nextclade-web/export.md @@ -6,8 +6,8 @@ Once Nextclade has finished the analysis, you can download the results in a vari > 💡 We recommend to start with the TSV output file for most users. -See detailed description of the available files in the [Output files](../output-files) section. +See detailed description of the available files in the [Output files](../output-files/index.rst) section. -These are the same files as produced by [Nextclade CLI](../nextclade-cli) +These are the same files as produced by [Nextclade CLI](../nextclade-cli/index.rst) For CSV and TSV files, you can choose which columns to include in the output. By default, all columns are included. You can uncheck the columns you don't need. diff --git a/docs/user/nextclade-web/getting-started.md b/docs/user/nextclade-web/getting-started.md index b47ace81f..cfc2c7fa0 100644 --- a/docs/user/nextclade-web/getting-started.md +++ b/docs/user/nextclade-web/getting-started.md @@ -1,7 +1,7 @@ ## Getting started | | -| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Brief video demonstration of Nextclade Web features. A high resolution version is available here. | Open [clades.nextstrain.org](https://clades.nextstrain.org) in your browser. @@ -28,9 +28,9 @@ There are a number of options for providing input data to Nextclade, including: - Provide a URL (link) to a file publicly available on the internet: click the "Link" tab and paste the URL - Paste sequence data from clipboard: click the "Paste" tab and paste the fasta data - Select example sequences: click "Examples" and choose a pathogen from the menu. -- Provide an `input-fasta` URL parameter (See [URL parameters](./url-parameters)) +- Provide an `input-fasta` URL parameter (See [URL parameters](./url-parameters.md)) -We recommend to analyze at most a few hundred sequences at a time in Nextclade Web. On high-end hardware, Nextclade web can handle up to around 50 MB of input FASTA data. If you need to analyze more sequences, try the command-line version of Nextclade called [Nextclade CLI](../nextclade-cli) which can handle arbitrarily large datasets (300 GB and more). +We recommend to analyze at most a few hundred sequences at a time in Nextclade Web. On high-end hardware, Nextclade web can handle up to around 50 MB of input FASTA data. If you need to analyze more sequences, try the command-line version of Nextclade called [Nextclade CLI](../nextclade-cli/index.rst) which can handle arbitrarily large datasets (300 GB and more). In order to allow trying out Nextclade without your own sequences, Nextclade provides example sequences for all supported viruses. You can load example sequences by clicking on "Example" and selecting one of the provided viruses/datasets: @@ -38,7 +38,7 @@ In order to allow trying out Nextclade without your own sequences, Nextclade pro ### 2. Select a dataset -Besides input sequences, Nextclade needs to know which dataset to use to perform the analysis. A dataset is a set of files that configures Nextclade to work with a particular virus or strain. For example, a SARS-CoV-2 dataset contains a SARS-CoV-2 specific reference genome, a genome annotation, a reference tree, and other configuration files. You can learn more about datasets in the [Datasets](../datasets) section. +Besides input sequences, Nextclade needs to know which dataset to use to perform the analysis. A dataset is a set of files that configures Nextclade to work with a particular virus or strain. For example, a SARS-CoV-2 dataset contains a SARS-CoV-2 specific reference genome, a genome annotation, a reference tree, and other configuration files. You can learn more about datasets in the [Datasets](../datasets.md) section. Most users don't need to worry about the dataset files, because Nextclade provides datasets for a variety of viruses out of the box. The only thing you need to do is to choose an appropriate dataset for your sequences. @@ -57,8 +57,8 @@ On the dataset page you can see the list of all existing datasets. The subset of On that page You can also find some information about dataset in the "Summary" tab. As well as a history of changes in the "History" tab. -Advanced users may override dataset files on the "Customize" tab. This requires good understanding of [Input files](../input-files) and of the [Nextclade algorithms](../algorithm). +Advanced users may override dataset files on the "Customize" tab. This requires good understanding of [Input files](../input-files/index.rst) and of the [Nextclade algorithms](../algorithm/index.rst). ### 3. Run the analysis -Once you are happy with the set of sequences and with the selected dataset, click "Run" to start the analysis. Nextclade will then automatically navigate to the [analysis results page](analysis-results-table). +Once you are happy with the set of sequences and with the selected dataset, click "Run" to start the analysis. Nextclade will then automatically navigate to the [analysis results page](analysis-results-table.md). diff --git a/docs/user/nextclade-web/phylogenetic-tree-view.md b/docs/user/nextclade-web/phylogenetic-tree-view.md index 6113b5b56..d55a814d7 100644 --- a/docs/user/nextclade-web/phylogenetic-tree-view.md +++ b/docs/user/nextclade-web/phylogenetic-tree-view.md @@ -1,12 +1,12 @@ ## Phylogenetic tree view -In order to assign clades to sequences, Nextclade [places](../algorithm/05-phylogenetic-placement) all new sequences on a reference tree. You can view the resulting tree by clicking on the tree tab at the top left. +In order to assign clades to sequences, Nextclade [places](../algorithm/03-phylogenetic-placement.md) all new sequences on a reference tree. You can view the resulting tree by clicking on the tree tab at the top left. The tree is visualized by [Nextstrain Auspice](https://docs.nextstrain.org/projects/auspice/en/stable/). By default, only your uploaded sequences are highlighted. ![Tree with new sequences](../assets/web_tree.png) Nextclade runs a greedy parsimony tree builder on user provided sequences. This means that approximate ancestral relationships between your sequences are visible on the tree. Given the simplicity of the tree builder, the tree is not guaranteed to be optimal. In the screenshot below, all but the 3 grey sequences are user provided. Nextclade has grouped related user provided sequences into clusters, based on shared mutations. -![Nextclade tree builder](../assetts/../assets/web_tree-builder.png) +![Nextclade tree builder](../assets/web_tree-builder.png) For a more accurate tree including your sequences, you can use [Usher](https://genome.ucsc.edu/cgi-bin/hgPhyloPlace), which works out of the box with SARS-CoV-2, hMPXV, RSV-A, and RSV-B (as of January 2024). diff --git a/docs/user/nextclade-web/url-parameters.md b/docs/user/nextclade-web/url-parameters.md index 367eea5d1..c67804253 100644 --- a/docs/user/nextclade-web/url-parameters.md +++ b/docs/user/nextclade-web/url-parameters.md @@ -1,10 +1,10 @@ ## URL parameters -Nextclade Web can be configured using URL parameters. The names of the parameters match or are similar to the corresponding arguments of [Nextclade CLI](../nextclade-cli). +Nextclade Web can be configured using URL parameters. The names of the parameters match or are similar to the corresponding arguments of [Nextclade CLI](../nextclade-cli/index.rst). These URL parameters allow to construct URLs for navigation to Nextclade Web already preconfigured with certain dataset and to feed input sequence data and other files from remote locations. This might be useful for testing new datasets as well as for third-party integrations. -This section assumes basic knowledge about how Nextclade Web works as well as about input files and datasets. You can learn more about input files and datasets in the dedicated sections: [Input files](../input-files), and [Nextclade datasets](../datasets). +This section assumes basic knowledge about how Nextclade Web works as well as about input files and datasets. You can learn more about input files and datasets in the dedicated sections: [Input files](../input-files/index.rst), and [Nextclade datasets](../datasets.md). All URL parameters are optional. If all parameters are omitted, Nextclade Web behaves normally. Adding a parameter configures certain aspect of the application: @@ -25,7 +25,7 @@ If an `input-fasta` URL parameter is provided, Nextclade Web automatically start For example, the file with input sequences hosted at `https://example.com/sequences.fasta` can be specified with: -```url +``` https://clades.nextstrain.org?dataset-name=sars-cov-2 &input-fasta=https://example.com/sequences.fasta ``` @@ -38,13 +38,13 @@ In this case, Nextclade will download the latest SARS-CoV-2 dataset and the prov The special value `&input-fasta=example` will instruct Nextclade to use the example sequences of the dataset (this option is useful for demonstration purposes as users will not need to click anything): -```url +``` https://clades.nextstrain.org?dataset-name=sars-cov-2&input-fasta=example ``` Multiple files can be specified, for example the sequences and the reference tree: -```url +``` https://clades.nextstrain.org ?dataset-name=sars-cov-2 &input-fasta=https://example.com/sequences.fasta @@ -53,7 +53,7 @@ https://clades.nextstrain.org Another dataset can be specified with `dataset-name`: -```url +``` https://clades.nextstrain.org ?dataset-name=flu_h3n2_ha &input-fasta=https://example.com/flu_sequences.fasta @@ -61,25 +61,25 @@ https://clades.nextstrain.org A custom dataset server can be specified using `dataset-server` param. In this case the dataset list (index) will be downloaded from this server instead of the default. Example: -```url +``` https://clades.nextstrain.org?dataset-server=http://example.com ``` Local URLs should also work: -```url +``` https://clades.nextstrain.org?dataset-server=http://localhost:8080 ``` All mentioned parameters accept the usual full HTTP URLs -```url +``` https://clades.nextstrain.org?dataset-url=http://example.com/path/to/dataset ``` as well as URLs to GitHub repos: -```txt +```text ?dataset-url=https://github.com/owner/repo ?dataset-url=https://github.com/owner/repo/tree/branch ?dataset-url=https://github.com/owner/repo/blob/branch/path/to/file @@ -87,7 +87,7 @@ as well as URLs to GitHub repos: as well as shortcuts to GitHub repos in the following format: -```txt +```text ?dataset-url=gh:owner/repo ?dataset-url=gh:owner/repo/path/to/file ?dataset-url=gh:owner/repo@branch@ diff --git a/docs/user/output-files/06-tree.md b/docs/user/output-files/06-tree.md index c44066445..99ad60dab 100644 --- a/docs/user/output-files/06-tree.md +++ b/docs/user/output-files/06-tree.md @@ -4,7 +4,7 @@ Nextclade Web: download `nextclade.auspice.json` or `nextclade.nwk` Nextclade CLI flags: `--output-tree`/`-T` or `--output-tree-nwk` -Output phylogenetic tree. This is the input [reference tree](../input-files/04-reference-tree.md), with [query sequences](../input-files/01-sequence-data.md) placed onto it during the [phylogenetic placement step](../algorithm/05-phylogenetic-placement). +Output phylogenetic tree. This is the input [reference tree](../input-files/04-reference-tree.md), with [query sequences](../input-files/01-sequence-data.md) placed onto it during the [phylogenetic placement step](../algorithm/03-phylogenetic-placement.md). The tree comes either in Auspice JSON v2 format or in Newick format. diff --git a/docs/user/terminology.md b/docs/user/terminology.md index 177234167..5049fce8f 100644 --- a/docs/user/terminology.md +++ b/docs/user/terminology.md @@ -124,11 +124,11 @@ A set of entries describing [CDS](#cds) for a particular virus. This includes na The process of arranging [Query sequence](#query-sequence) against [Reference sequence](#reference-sequence) (or [Query peptide](#query-peptide) against [Reference peptide](#reference-peptide)) to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. -During alignment, the fragments of the query sequence are compared to the fragments of the reference sequence, the similarities are identified and the fragments are repositioned such that to increase similarity. The resulting [aligned sequences](#alignment-result) allow comparisons on nucleotide (or aminoacid) level and to perform further analysis for example deducing mutations and other features of practical interest). +During alignment, the fragments of the query sequence are compared to the fragments of the reference sequence, the similarities are identified and the fragments are repositioned such that to increase similarity. The resulting [aligned sequences](#alignment-result) allow comparisons on nucleotide (or aminoacid) level and to perform further analysis for example deducing mutations and other features of practical interest. (this definition is adapted with modifications from: [wikipedia: Sequence alignment](https://en.wikipedia.org/wiki/Sequence_alignment)) -See [Algorithm: phylogenetic placement](algorithm#alignment) for more details. +See [Algorithm: phylogenetic placement](algorithm/01-sequence-alignment.md) for more details. ### Alignment (result) @@ -146,9 +146,9 @@ See also: [Wikipedia: Clade](https://en.wikipedia.org/wiki/Clade) ### Phylogenetic placement -The process of adding [New nodes](#new-node) to the the [Reference](#reference-tree-concept) tree. +The process of adding [New nodes](#new-node) to the [Reference](#reference-tree-concept) tree. -See [Algorithm: phylogenetic placement](algorithm#phylogenetic-placement) for more details. +See [Algorithm: phylogenetic placement](algorithm/03-phylogenetic-placement.md) for more details. ### Analysis