-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
447 coda transformations add selection of columns/attributes of the input data #459
Draft
msorvoja
wants to merge
32
commits into
master
Choose a base branch
from
447-coda-transformations-add-selection-of-columnsattributes-of-the-input-data
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
12fe14e
Refactor(CLR): Add option to select columns
msorvoja e2651d5
Edit docstrings
msorvoja 64d7bad
refactor(ALR): Add option to select columns
msorvoja 467d201
refactor(PLR): Add option to select columns
msorvoja 869ba24
Run pre-commit
msorvoja 1e61ce9
Merge master into branch
msorvoja b0c5659
Fix(clr_transform_cli): make columns optional parameter
msorvoja f546cfd
feat(plr_transform_cli): add columns parameters
msorvoja 8ce165f
fix(ALR): Perform check_in_simplex_sample_space after selecting columns
msorvoja da02d54
fix(CLR): Perform check_in_simplex_sample_space after selecting columns
msorvoja 72deedb
fix(PLR): Perform check_in_simplex_space after selecting columns
msorvoja cdf056e
feat(closure): add function for performing closure on DataFrame
msorvoja dacd830
feat(PLR): add parameter for performing closure on the input DataFrame
msorvoja 5f47ff2
fix(closure): improve docstring
msorvoja 33f4d78
feat(ALR): add closure_target parameter
msorvoja 3bcfedd
feat(CLR): add closure_target parameter
msorvoja 8f1679d
feat(ILR): add closure_target parameter
msorvoja 85064f2
Fix parameter type for closure_target
msorvoja 011a7e3
Select subcomposition columns if closure is performed
msorvoja f47b897
Fix notebook
msorvoja 0a18030
Remove duplicate closure function
msorvoja 94991cc
Improve documentation
msorvoja d117ce4
Merge branch 'master' into 447-coda-transformations-add-selection-of-…
msorvoja ffa84e6
Fix notebook
msorvoja 662e73e
Clean code
msorvoja 6f4469a
Clean code
msorvoja a513418
Select columns before performing closure
msorvoja 0000746
Fix test
msorvoja 8e09947
Add checks, fix logic, add tests
msorvoja 3d56378
Add scale parameter to CoDa CLI functions
msorvoja 1bf85a9
Fix notebook
msorvoja 0026856
Fix missing denominator column in ALR
msorvoja File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,88 +1,125 @@ | ||
from numbers import Number | ||
|
||
import numpy as np | ||
import pandas as pd | ||
from beartype import beartype | ||
from beartype.typing import Optional, Sequence | ||
|
||
from eis_toolkit.exceptions import InvalidColumnException, NumericValueSignException | ||
from eis_toolkit.utilities.aitchison_geometry import _closure | ||
from eis_toolkit.utilities.checks.compositional import check_in_simplex_sample_space | ||
from eis_toolkit.utilities.miscellaneous import rename_columns_by_pattern | ||
|
||
|
||
@beartype | ||
def _alr_transform(df: pd.DataFrame, columns: Sequence[str], denominator_column: str) -> pd.DataFrame: | ||
|
||
ratios = df[columns].div(df[denominator_column], axis=0) | ||
return np.log(ratios) | ||
|
||
|
||
@beartype | ||
def alr_transform( | ||
df: pd.DataFrame, column: Optional[str] = None, keep_denominator_column: bool = False | ||
) -> pd.DataFrame: | ||
""" | ||
Perform an additive logratio transformation on the data. | ||
|
||
Args: | ||
df: A dataframe of compositional data. | ||
column: The name of the column to be used as the denominator column. | ||
keep_denominator_column: Whether to include the denominator column in the result. If True, the returned | ||
dataframe retains its original shape. | ||
|
||
Returns: | ||
A new dataframe containing the ALR transformed data. | ||
|
||
Raises: | ||
InvalidColumnException: The input column isn't found in the dataframe. | ||
InvalidCompositionException: Data is not normalized to the expected value. | ||
NumericValueSignException: Data contains zeros or negative values. | ||
""" | ||
check_in_simplex_sample_space(df) | ||
|
||
if column is not None and column not in df.columns: | ||
raise InvalidColumnException(f"The column {column} was not found in the dataframe.") | ||
|
||
column = column if column is not None else df.columns[-1] | ||
|
||
columns = [col for col in df.columns] | ||
|
||
if not keep_denominator_column and column in columns: | ||
columns.remove(column) | ||
|
||
return rename_columns_by_pattern(_alr_transform(df, columns, column)) | ||
|
||
|
||
@beartype | ||
def _inverse_alr(df: pd.DataFrame, denominator_column: str, scale: Number = 1.0) -> pd.DataFrame: | ||
dfc = df.copy() | ||
|
||
if denominator_column not in dfc.columns.values: | ||
# Add the denominator column | ||
dfc[denominator_column] = 0.0 | ||
|
||
return _closure(np.exp(dfc), scale) | ||
|
||
|
||
@beartype | ||
def inverse_alr(df: pd.DataFrame, denominator_column: str, scale: Number = 1.0) -> pd.DataFrame: | ||
""" | ||
Perform the inverse transformation for a set of ALR transformed data. | ||
|
||
Args: | ||
df: A dataframe of ALR transformed compositional data. | ||
denominator_column: The name of the denominator column. | ||
scale: The value to which each composition should be normalized. Eg., if the composition is expressed | ||
as percentages, scale=100. | ||
|
||
Returns: | ||
A dataframe containing the inverse transformed data. | ||
|
||
Raises: | ||
NumericValueSignException: The input scale value is zero or less. | ||
""" | ||
if scale <= 0: | ||
raise NumericValueSignException("The scale value should be positive.") | ||
|
||
return _inverse_alr(df, denominator_column, scale) | ||
from numbers import Number | ||
|
||
import numpy as np | ||
import pandas as pd | ||
from beartype import beartype | ||
from beartype.typing import Optional, Sequence | ||
|
||
from eis_toolkit.exceptions import InvalidColumnException, NumericValueSignException | ||
from eis_toolkit.utilities.aitchison_geometry import _closure | ||
from eis_toolkit.utilities.checks.compositional import check_in_simplex_sample_space | ||
from eis_toolkit.utilities.miscellaneous import rename_columns_by_pattern | ||
|
||
|
||
@beartype | ||
def _alr_transform(df: pd.DataFrame, columns: Sequence[str], denominator_column: str) -> pd.DataFrame: | ||
|
||
ratios = df[columns].div(df[denominator_column], axis=0) | ||
return np.log(ratios) | ||
|
||
|
||
@beartype | ||
def alr_transform( | ||
df: pd.DataFrame, | ||
columns: Optional[Sequence[str]] = None, | ||
denominator_column: Optional[str] = None, | ||
keep_denominator_column: bool = False, | ||
scale: Optional[Number] = None, | ||
) -> pd.DataFrame: | ||
""" | ||
Perform an additive logratio transformation on the data. | ||
|
||
Args: | ||
df: A dataframe of compositional data. | ||
columns: The names of the columns to be transformed. | ||
denominator_column: The name of the column to be used as the denominator column. | ||
keep_denominator_column: Whether to include the denominator column in the result. If True, the returned | ||
dataframe retains its original shape. | ||
scale: The value to which each composition should be normalized. Eg., if the composition is expressed | ||
as percentages, scale=100. Closure is not performed by default. | ||
|
||
Returns: | ||
A new dataframe containing the ALR transformed data. | ||
|
||
Raises: | ||
InvalidColumnException: The input column isn't found in the dataframe. | ||
InvalidCompositionException: Data is not normalized to the expected value. | ||
NumericValueSignException: Data contains zeros or negative values. | ||
""" | ||
|
||
if denominator_column is not None and denominator_column not in df.columns: | ||
raise InvalidColumnException(f"The column {denominator_column} was not found in the dataframe.") | ||
|
||
if denominator_column is not None and keep_denominator_column and columns and denominator_column not in columns: | ||
raise InvalidColumnException( | ||
f"Denominator column '{denominator_column}' must be in selected columns if keep_denominator_column is True." | ||
) | ||
|
||
denominator_column = denominator_column if denominator_column is not None else df.columns[-1] | ||
|
||
if columns: | ||
invalid_columns = [col for col in columns if col not in df.columns] | ||
if invalid_columns: | ||
raise InvalidColumnException(f"The following columns were not found in the dataframe: {invalid_columns}.") | ||
columns_to_transform = columns | ||
|
||
if denominator_column not in columns_to_transform: | ||
df = df[columns_to_transform + [denominator_column]] | ||
else: | ||
df = df[columns_to_transform] | ||
|
||
else: | ||
columns_to_transform = df.columns.to_list() | ||
|
||
if scale is not None: | ||
df = _closure(df, scale) | ||
|
||
check_in_simplex_sample_space(df) | ||
|
||
if not keep_denominator_column and denominator_column in columns_to_transform: | ||
columns_to_transform.remove(denominator_column) | ||
|
||
return rename_columns_by_pattern(_alr_transform(df, columns_to_transform, denominator_column)) | ||
|
||
|
||
@beartype | ||
def _inverse_alr(df: pd.DataFrame, denominator_column: str, scale: Number = 1.0) -> pd.DataFrame: | ||
dfc = df.copy() | ||
if denominator_column not in dfc.columns.values: | ||
# Add the denominator column | ||
dfc[denominator_column] = 0.0 | ||
|
||
return _closure(np.exp(dfc), scale) | ||
|
||
|
||
@beartype | ||
def inverse_alr( | ||
df: pd.DataFrame, denominator_column: str, columns: Optional[Sequence[str]] = None, scale: Number = 1.0 | ||
) -> pd.DataFrame: | ||
""" | ||
Perform the inverse transformation for a set of ALR transformed data. | ||
|
||
Args: | ||
df: A dataframe of ALR transformed compositional data. | ||
denominator_column: The name of the denominator column. | ||
columns: The names of the columns to be transformed. | ||
scale: The value to which each composition should be normalized. Eg., if the composition is expressed | ||
as percentages, scale=100. | ||
|
||
Returns: | ||
A dataframe containing the inverse transformed data. | ||
|
||
Raises: | ||
InvalidColumnException: The input column(s) not found in the dataframe. | ||
NumericValueSignException: The input scale value is zero or less. | ||
""" | ||
if scale <= 0: | ||
raise NumericValueSignException("The scale value should be positive.") | ||
|
||
if columns: | ||
invalid_columns = [col for col in columns if col not in df.columns] | ||
if invalid_columns: | ||
raise InvalidColumnException(f"The following columns were not found in the dataframe: {invalid_columns}.") | ||
df = df[columns] | ||
|
||
return _inverse_alr(df, denominator_column, scale) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This applies to the other transforms as well.)
From the perspective of the QGIS plugin, would it make sense that when given a subcomposition (ie. certain columns to use), the resulting df would be combined back with the other columns in the data? Or is that something that can be implemented on the plugin side easily?
I don't know much about the actual use cases, so I'm not sure whether the user will typically want to keep working with the CoDa data separately, but I would assume it's more convenient alongside the rest of the data.