All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Initial release of
fedalgo
, the renamed version ofgwasprs
, to make this repository more general to support any federated algorithms.
- Add federated Cox PH Regression
cox.py
.
To help users transition from gwasprs
to fedalgo
, follow these steps:
Package Import: Update all your import statements in the codebase.
# Old import
import gwasprs
# New import
from fedalgo import gwasprs
- Use the numpy to perform mean imputation, the output will be either
np.ndarray
orjax.Array
.
- Add functions for performing federated LD pruning (
ld.py
). - Add functions for plotting the manhattan and qq plots of GWAS result (
gwasplot.py
).
- Fix the precision issue by replacing
cdf
withsf
to calculate p-value. - Change
jax_cpu_cores
tojax_dev_count
for more general usage. - Format coding style.
- Fix the issue came from
orthonormalize
output. - Fix the mean imputation issue when matrix too large.
- Fix the array type under the regression objects.
- Add expected total steps in
GWASDataIterator
.
- Return singular values in
orthonormalize()
.
- Clean the implementation of logistic regression.
- Update plink2 download url.
- Return
BlockDiagonalMatrix
while indexing with interval.
- Adjust the implementation of
create_snp_table
.
- Add
get_qc_metadata
for removing unnecessary bfile reading step.
- Add array type checking in
array.py
andaggregations.py
.
- Support iterate array over rows or columns using
ArrayIterator
.
- Add array type checking in
mask.py
.
BlockDiagonalMatrix
supports initializing from list and dense matrix,fromdense
,fromlist
andfromindex
.
- Replace
scipy.linalg.block_diag
with self-definingblock_diag
supporting n-dimensional arrays.
- Add
tolist
toBlockDiagonalMatrix
.
- Move
drop_missing_samples
tokwargs
instead. - Fix logics in
iterator.py
.
- Add block diagonal matrix multiplication assertion.
- Make
drop_missing_samples
optional inGWASData
.
- Fix the endless iteration of
SNPIterator
. - Fix the additional step for two-layer iterations.
Intersect
supports aggregate list of list.
- Covariates filtering api follows the bfile filtering.
- Make qc codes easy to read.
- Bug fix.
- Make poetry to build plink2.
- Fix the incorrect sizes of the outputs from
GWASDataIterator
.
- Remove keeping dropped information.
- Unit test over python 3.12.
- Ensure dimension of
mse
andXtX
are matched, otherwise align them together. - Resolve frequent CI error due to numerical error.
- Increase numerical stability for
inv
andInverseSolver
.
- Change
BlockedLinearRegression.sse(X, y, nobs)
toBlockedLinearRegression.sse(X, y)
.
- Increase numerical stability for
CholeskySolver
.
BlockDiagonalMatrix
supports addition only for matrix which poses the sameblockshapes
.
SumUp
supportsBlockDiagonalMatrix
.
- Support
BlockDiagonalMatrix.shape
.
CholeskySolver
supportsBlockDiagonalMatrix
.
- Support
BlockDiagonalMatrix.append(BlockDiagonalMatrix)
.
- Deep copy arrays while construsting
BlockDiagonalMatrix
.
- Improve
t_stats
and add mse.
- Adapt
blocked_unnorm_autocovariance
andblocked_unnorm_covariance
toBlockDiagonalMatrix
.
BlockedLinearRegression
integratesBlockDiagonalMatrix
.
- Fix
GWASData.subset
. - Change dependency
polars
topandas
.
- Add
BlockDiagonalMatrix
,AbstractBlockDiagonalMatrix
andBlockDiagonalMatrixIterator
. - Support indexing, shape, matrix-vector multiplication (
linalg.mvmul
), matrix-vector dot product (linalg.mvdot
), matrix-matrix multiplication (linalg.matmul
) and inverse operation (linalg.inv
) forBlockDiagonalMatrix
.
- Add
BlockedLinearRegression
.
LinearRegression.predict
support sparseX
.
LinearRegression
supportinclude_bias
parameter.
SumUp
support sparse array.
- Add
GWASData
andGWASDataIterator
.
- Add
blocked_unnorm_autocovariance
andblocked_unnorm_covariance
.
- Refactor 4 interfaces for PCA algorithms.
- Migrate array functions from RAFAEL and add a series of array types and
expand_to_2dim
.
- Add
Intersect
.
- Refactor QC.
- Add
qc_stats
,calculate_allele_freq_hwe
,calculate_homo_het_count
,write_snp_list
,get_histogram
andget_obs_count
. - Deprecate
cal_qc_client
.
- Refactor loader.
- Add
include_bias
forBatchedLinearRegression
.
- Correct
dof
forBatchedLinearRegression
. - Fix
fit
construction forBatchedLinearRegression
.
- Add
SumUp
aggregation.
- Fix batch axis for
batched_unnorm_autocovariance
.
- Add
gwasprs.regression.add_bias
.
- Match the requirement of logistic usecase in stepfl
- Add batched logistic regression
- Set batch axis to 0 for all batch operations.
- Extract 4 interfaces for PCA algorithms
- Add comparisons between plink, numpy and fed-svd
- Match the requirement of PCA usecase in stepfl
- Main update of all federated SVD process
- Fix
plink2
not found completely and support different platform plink2.
- Fix
plink2
not found.
- swap in loader, remove self before BIM.
- add
--rm-dup 'force-first'
in qc stat calculation to deal with duplication problem - chromosome name is set to
1..22
instead of1..21
- Some minor text adjustment
- (Batched)InverseSolver solve linear system bypassing inverse matrix
- Switch nose to pytest.
- Add
setup_plink2
and collect to setup.py.
batched_unnorm_autocovariance
andBatchedLinearRegression
supportpmap
.
- Fix
BatchedCholeskySolver
.
- Add
batched_diagonal
to linalg.
- Fix
BatchedLinearRegression.t_stats
.
- Fix
BatchedLinearRegression.predict
.
- Add
batched_matmul
,batched_mvdot
,batched_mvdot
,batched_mvmul
,batched_inv
,batched_cholesky
,batched_solve_triangular
,BatchedInverseSolver
,BatchedCholeskySolver
to linalg. - Add
batched_unnorm_autocovariance
,batched_unnorm_covariance
to stats. - Add
BatchedLinearRegression
to regression. - Add
isnonnan
to mask.
- Add test data for
test_qc
andtest_loader
. - Add
get_snp_table
inGwasDataLoader
to deal with duplicated snp.
- Qc, non-autosome snp will cause abnormal
.hardy
file (haploid will be removed, x saved as .hardy.x), implementation for filtering out non-autosome snp in loader - Fix snp rename max allele size to 23