Skip to content

Agas is a small Python library for pairing data series based on aggregate measures

License

Notifications You must be signed in to change notification settings

EitanHemed/agas

Repository files navigation

fury.io shields.io pepy.tech codecov.io

Agas

Agas is a small Python library for pairing similar (or dissimilar) data series.

Often when you have data from multiple units (e.g., participants, sensors) you need to find similar pairs of units such as two units which have similar variance relative to the rest of the sample, or perhaps units which are similar on one criteria and different on another (maximize on mean and minimize similarity on the sum of values).

Agas allows you to test the matching of all possible pairs flexibly.

The name Agas is abbreviation for aggregated-series. Also, 'Agas' is Hebrew for 'Pear'.

Setup and requirements

pip install agas, Conda package coming soon!

The requirements are just NumPy and Pandas. The examples on the tutorial require additional pacakges.

Usage

agas 0.0.1 exposes the functions agas.pair_from_array and agas.pair_from_wide_df. For more details please refer to the API reference.

import numpy as np
import seaborn as sns
import pandas as pd

pd.set_option('display.precision', 2)
pd.set_option("display.max_columns", 5)

np.set_printoptions(precision=3)

sns.set_context('notebook')

import agas

Given the 2D array a, find rows which have the most similar standard deviation values, and the most different total sums.

a = np.vstack([[0, 0.5], [0.5, 0.5], [5, 5], [4, 10]])
np.stack([a.std(axis=1), a.sum(axis=1)], axis=1)
array([[ 0.25,  0.5 ],
       [ 0.  ,  1.  ],
       [ 0.  , 10.  ],
       [ 3.  , 14.  ]])

It is easy to see that the optimal pair of rows in this case is the 2nd and 3rd rows.

By deafult Agas returns only the optimal pair (see below for more options). indices represents the indices of the pair of optimal rows, scores represents the optimal score (0).

indices, scores = agas.pair_from_array(a, similarity_function=np.std, divergence_function=np.sum)
print(indices)
print(scores)
[1 2]
[0.]

If we care more about divergence in sum of each row, we can decrease the weight given to the similarity function, here np.std. This is done by using the similarity_weight argument (defaults to 0.5).

indices, _ = agas.pair_from_array(a, similarity_function=np.std, divergence_function=np.sum,
                                  similarity_weight=0.3)
print(indices)
[0 2]

You can view the optimality scores assigned to each of the pairs, using the return_matrix argument. The pairing of the 1st and 3rd rows [0, 2] receives the score 0, which is most optimal.

The diagonal is empty as the matching of a row with itself is not calculated by Agas.

g = sns.heatmap(
    agas.pair_from_array(
        a, similarity_function=np.std,
          divergence_function=np.sum,
          similarity_weight=0.3, return_matrix=True),
    annot=True)
g.get_figure().set_facecolor('white')

png

agas.pair_from_wide_df can be used to find the optimal pair of rows given a dataframe.

wide_df = pd.DataFrame(np.hstack([a, a ** 2]),
                  columns=['A', 'B', 'C', 'D'],
                  index=['Y1', 'Y2', 'Y3', 'Y4']).T
print(wide_df)
     Y1    Y2    Y3     Y4
A  0.00  0.50   5.0    4.0
B  0.50  0.50   5.0   10.0
C  0.00  0.25  25.0   16.0
D  0.25  0.25  25.0  100.0

On both pair_from_wide_df and pair_from_array we can use the return_filter argument to receive pairs with scores within a set range. The default is to only return the first value, here we ask only for scores lower than .7.

indices, scores = agas.pair_from_wide_df(wide_df, np.mean, np.max,
                                         return_filter=0.7)
print(f'Indices of of rows with optimality scores below .7 - \n{indices}')
print(f'Matching scores  - {scores}')

Selecting the optimal pair of rows - similar means, different maximal values:

print(
    "Aggregated: ",
    wide_df.agg([np.mean, np.max], axis=1).to_markdown(),
    "Optimal pair (raw): ",
      wide_df.iloc[indices[0], :].to_markdown(),
    sep='\n\n')

for more examples see the tutorial.

Documentation

See Here.

Bug reports

Please open an issue on GitHub.

About

Agas is a small Python library for pairing data series based on aggregate measures

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages