GitHub - EitanHemed/agas: Agas is a small Python library for pairing data series based on aggregate measures

Agas

Agas is a small Python library for pairing similar (or dissimilar) data series.

Often when you have data from multiple units (e.g., participants, sensors) you need to find similar pairs of units such as two units which have similar variance relative to the rest of the sample, or perhaps units which are similar on one criteria and different on another (maximize on mean and minimize similarity on the sum of values).

Agas allows you to test the matching of all possible pairs flexibly.

The name Agas is abbreviation for aggregated-series. Also, 'Agas' is Hebrew for 'Pear'.

Setup and requirements

pip install agas, Conda package coming soon!

The requirements are just NumPy and Pandas. The examples on the tutorial require additional pacakges.

Usage

agas 0.0.1 exposes the functions agas.pair_from_array and agas.pair_from_wide_df. For more details please refer to the API reference.

import numpy as np
import seaborn as sns
import pandas as pd

pd.set_option('display.precision', 2)
pd.set_option("display.max_columns", 5)

np.set_printoptions(precision=3)

sns.set_context('notebook')

import agas

Given the 2D array a, find rows which have the most similar standard deviation values, and the most different total sums.

a = np.vstack([[0, 0.5], [0.5, 0.5], [5, 5], [4, 10]])

np.stack([a.std(axis=1), a.sum(axis=1)], axis=1)

array([[ 0.25,  0.5 ],
       [ 0.  ,  1.  ],
       [ 0.  , 10.  ],
       [ 3.  , 14.  ]])

It is easy to see that the optimal pair of rows in this case is the 2nd and 3rd rows.

By deafult Agas returns only the optimal pair (see below for more options). indices represents the indices of the pair of optimal rows, scores represents the optimal score (0).

indices, scores = agas.pair_from_array(a, similarity_function=np.std, divergence_function=np.sum)
print(indices)
print(scores)

[1 2]
[0.]

If we care more about divergence in sum of each row, we can decrease the weight given to the similarity function, here np.std. This is done by using the similarity_weight argument (defaults to 0.5).

indices, _ = agas.pair_from_array(a, similarity_function=np.std, divergence_function=np.sum,
                                  similarity_weight=0.3)
print(indices)

[0 2]

You can view the optimality scores assigned to each of the pairs, using the return_matrix argument. The pairing of the 1st and 3rd rows [0, 2] receives the score 0, which is most optimal.

The diagonal is empty as the matching of a row with itself is not calculated by Agas.

g = sns.heatmap(
    agas.pair_from_array(
        a, similarity_function=np.std,
          divergence_function=np.sum,
          similarity_weight=0.3, return_matrix=True),
    annot=True)
g.get_figure().set_facecolor('white')

agas.pair_from_wide_df can be used to find the optimal pair of rows given a dataframe.

wide_df = pd.DataFrame(np.hstack([a, a ** 2]),
                  columns=['A', 'B', 'C', 'D'],
                  index=['Y1', 'Y2', 'Y3', 'Y4']).T
print(wide_df)

     Y1    Y2    Y3     Y4
A  0.00  0.50   5.0    4.0
B  0.50  0.50   5.0   10.0
C  0.00  0.25  25.0   16.0
D  0.25  0.25  25.0  100.0

On both pair_from_wide_df and pair_from_array we can use the return_filter argument to receive pairs with scores within a set range. The default is to only return the first value, here we ask only for scores lower than .7.

indices, scores = agas.pair_from_wide_df(wide_df, np.mean, np.max,
                                         return_filter=0.7)
print(f'Indices of of rows with optimality scores below .7 - \n{indices}')
print(f'Matching scores  - {scores}')

Selecting the optimal pair of rows - similar means, different maximal values:

print(
    "Aggregated: ",
    wide_df.agg([np.mean, np.max], axis=1).to_markdown(),
    "Optimal pair (raw): ",
      wide_df.iloc[indices[0], :].to_markdown(),
    sep='\n\n')

for more examples see the tutorial.

Documentation

See Here.

Bug reports

Please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
README_files		README_files
agas		agas
docs		docs
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.ipynb		README.ipynb
README.md		README.md
coverage.xml		coverage.xml
requirement-test.txt		requirement-test.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agas

Setup and requirements

Usage

Documentation

Bug reports

About

Releases

Packages

Languages

License

EitanHemed/agas

Folders and files

Latest commit

History

Repository files navigation

Agas

Setup and requirements

Usage

Documentation

Bug reports

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages