Results from a model of global SARS-CoV-2 lineage competition
Estimates of multiplicative growth advantage (per week) for lineages are provided, both relative to the basal BA.2, and to the recently-dominant KP.3.1.1.
Inferred growth advantage mapped upon the Nextclade-curated phylogeny (pruned to keep only relatively competative variants):
For countries with more than 200 genomes deposited in the last 50 days, we plot the model trajectory estimates and forecasts. Forecasts for variants where the sampled genomes fall off prior to variant emergence are driven by global pooled estimates and should be treated with caution.
Bayesian 95% Credible Intervals are shown for: BA.2, KP.1.1.3, KP.2, KP.2.2, KP.2.3, KP.3, KP.3.1, KP.3.1.1, KP.3.2.3, KP.3.3, KP.3.3.1, KP.3.3.2, KP.3.3.3, KS.1, LB.1, LB.1.3, LB.1.3.1, LB.1.7, LP.1, LP.8.1, LZ.2, MC.1, MC.10, MC.10.1, MC.11, MC.13, MC.13.1, MC.16, MC.19, MC.2, MC.6, MC.9, MV.1, XDV.1, XDY, XEC, XEC.2, XEC.4, XEK, XEL
Variants are colored (from blue to red) in order of the number (low to high) of convergent mutations they exhibit (ie. those in mutation plot above).
Averaging out the country-specific growth rate and intercept adjustments:
SARS-CoV-2 sequence data from GISAID EpiCov (bulk .fasta download, 2024-11-30). We gratefully acknowledge all data contributors, i.e. the Authors and their Originating Laboratories responsible for obtaining the specimens, and their Submitting Laboratories that generated the genetic sequence and metadata and shared via the GISAID Initiative the data on which part of this research is based. Lineage assignments were made by Nextclade.
Countries included in the model: Slovakia, Gansu, Portugal, Austria, Brazil, Liaoning, Taiwan, Hubei, Sichuan, Israel, Chile, Poland, Russia, Slovenia, Luxembourg, Singapore, Ireland, Spain, Scotland, Italy, England, Japan, Wales, Germany, Denmark, Netherlands, Australia, France, Sweden, USA, Canada
SARS-CoV-2 lineages included in the model: BA.2, JN.1.11.1, JN.1.16, JN.1.16.1, JN.1.67.1, KP.1.1, KP.1.1.1, KP.1.1.3, KP.2, KP.2.14, KP.2.15, KP.2.15.1, KP.2.19, KP.2.2, KP.2.2.1, KP.2.3, KP.2.3.12, KP.2.3.4, KP.2.3.6, KP.2.3.8, KP.3, KP.3.1, KP.3.1.1, KP.3.1.4, KP.3.1.6, KP.3.1.7, KP.3.1.8, KP.3.2, KP.3.2.1, KP.3.2.3, KP.3.2.4, KP.3.2.5, KP.3.2.6, KP.3.2.7, KP.3.2.9, KP.3.3, KP.3.3.1, KP.3.3.2, KP.3.3.3, KP.3.3.4, KP.3.5, KP.4.1.3, KP.4.2.4, KS.1, KS.1.1, KS.1.1.2, LB.1, LB.1.2, LB.1.2.1, LB.1.2.2, LB.1.3, LB.1.3.1, LB.1.3.2, LB.1.4, LB.1.4.1, LB.1.5, LB.1.7, LB.1.7.1, LF.1.1.1, LF.3.1.1, LF.7, LF.7.1, LF.7.1.3, LF.7.2.1, LF.7.3, LF.9, LP.1, LP.4, LP.5, LP.7, LP.8.1, LU.2.1.1, LW.1, LY.1, LZ.2, MA.1, MC.1, MC.1.1, MC.10, MC.10.1, MC.11, MC.13, MC.13.1, MC.14, MC.15, MC.16, MC.17, MC.18, MC.19, MC.2, MC.2.1, MC.20, MC.21, MC.21.1, MC.22, MC.3, MC.4, MC.6, MC.8, MC.8.1, MC.9, MK.1, MT.1, MU.1, MU.2, MV.1, MW.1, NA.1, NB.1, ND.1.1, NF.1, NL.2, NL.3, NL.4, NL.5, XDV.1, XDV.1.1, XDV.1.5.1, XDV.1.7, XDY, XEC, XEC.1, XEC.2, XEC.4, XEC.5, XEC.6, XEF, XEJ, XEK, XEL
We show Effective Sample Size (ESS) and plot chains for the global lineage advantage parameters, as well as the inferred frequencies for some time points and some countries.
Inferred growth advantage mapped upon the Nextclade-curated phylogeny (full tree):
Lineage competition is modelled using a Bayesian multinomial regression approach. Briefly the global GISAID SARS-CoV-2 dataset (downloaded as a bulk .fasta file) is filtered for any sequences with collection dates within the previous 100 days. NextClade (with the BA.2.86 reference set, “nextstrain/sars-cov-2/BA.2.86”) is used for lineage assignment, and assignments are compiled into counts per country per lineage per day. If a sub-lineage is too infrequent to be included in the model, its count is added into its most recent included ancestor.
We model growth rates using a hierarchical approach. The growth rate for a lineage in a country is the sum of two components: a global rate, and a country-specific random effect. The global rate for each lineage is itself a sum of : i) branch-specific terms, for each branch ancestral to that lineage, plus ii) the sum of terms for "convergent" spike mutations (those occurring on multiple branches of the phylogeny) possessed by that lineage, plus iii) a lineage specific parameter. The rationale for this parameterization is that the contributions of a set of mutations that only occur on a single branch cannot be separated from each other, so those are just bundled into the branch-specific term, but evidence is shared when mutations occur on multiple branches. Further, growth rates are heritable over the phylogeny, and the model's expectation for a new lineage (which may not yet have large sequence volumes) is strongly informed by that of its ancestor's growth rate, since all-but-one of the branch-specific terms are in common, as are (typically) most of the convergent mutation terms. Practically, this is implemented by constructing a large sparse design matrix that encodes which lineages share branches and convergent mutations. Through this, we also allow recombinants to inherit a weighted mixture of their multiple parent's growth terms.
Introduction times are controlled by lineage-specific intercept terms, which have a global shared term per lineage, and country-specific random effects. Each "kind" of parameter has a Gaussian prior, centered on zero, and we use a Gaussian hyperprior over the log of the standard deviations of these priors. Posterior distributions for all parameters are sampled, via Hamiltonian Monte Carlo with the "No-U-Turn sampler" by AdvancedHMC.jl in Julia.
The "Model average" frequency plot intends to provide a "global" image of variant competition, and is produced using the posterior mean (over all post-burnin HMC samples) over all frequency trajectories but with the country-specific terms (including the country-specific growth rate random effects and the country-specific intercepts) set to their averages, no longer reflecting the details of any specific country, but also not being too strongly biased by unevenly distributed sequencing volumes for any specific country.