447 coda transformations add selection of columns/attributes of the input data #459

msorvoja · 2024-11-14T12:55:00Z

Fixes #447

I did not modify Single ILR as subcomposition parameters are already required.

And about PLR: I edited it so that if user selects some columns for the transformation, the numerator is placed on the first column and the selected columns on its right side so that the algorithm works. I'm not certain if it causes any errors in the results. Could @jtlait or @em-t verify whether this is okay or not?

jtlait · 2024-11-18T08:48:37Z

I am not too familiar with PLR, so maybe @em-t or @chudasama-bijal can comment on this. About another thing regarding column selection, check_in_simplex_sample_space cannot be run at first when columns are selected. If input dataframe contains some additional variables besides geochemical variables and only geochemical variables are chosen, then check_in_simplex_sample_space should be run only for the selected geochemical data.

em-t · 2024-11-18T15:08:16Z

I started to look at this today, but didn't quite have time for a proper review. I'll return to this tomorrow!

em-t

I tested through the functions, and looks good to me 👍

I noticed that the demo notebook (testing_logratio_transformations.ipynb) is out of date and running some parts cause errors. (Eg. inverse_alr is called with the wrong params.) All of the issues don't originate from this PR, but maybe they could be fixed here (or the notebook can be deleted).

Another thing unrelated to these changes - I was wondering, if we're limiting the composition total to 1 or 100, should the scale numbers in the inverse functions be limited to those values as well?

eis_toolkit/transformations/coda/plr.py

em-t · 2024-11-19T16:59:16Z

eis_toolkit/cli.py

@@ -3048,7 +3049,9 @@ def alr_transform_cli(
    df = pd.DataFrame(gdf.drop(columns="geometry"))
    typer.echo("Progress: 25%")

-    out_df = alr_transform(df=df, column=column, keep_denominator_column=keep_denominator_column)
+    out_df = alr_transform(


(This applies to the other transforms as well.)

From the perspective of the QGIS plugin, would it make sense that when given a subcomposition (ie. certain columns to use), the resulting df would be combined back with the other columns in the data? Or is that something that can be implemented on the plugin side easily?

I don't know much about the actual use cases, so I'm not sure whether the user will typically want to keep working with the CoDa data separately, but I would assume it's more convenient alongside the rest of the data.

chudasama-bijal · 2024-11-20T08:18:26Z

I will take a look at the coda related issues next week.

nmaarnio · 2024-11-29T08:07:22Z

Did you want to check this @chudasama-bijal ?

msorvoja · 2024-11-29T08:48:39Z

@em-t @jtlait I tried to fix the notebook (testing_logratio_transformations.ipynb) but I believe we need compositional example data. The IOCG_CLB_Till_Geochem_reg_511p.shp dataset used in the "Testing with example data" section of the notebook is not compositional (at least the rows don't sum to 1 or 100). I'm not sure if we have suitable data for this: do you guys have an example dataset that could be used in the notebook? And can you fix it?

jtlait · 2024-11-29T11:56:23Z

I couldn't find the .shp version of the file, but I did look the IOCG_CLB_Till_Geochem_reg_511p.gpkg.
In the .gpkg file the analyzed elements (columns with ppm) did not sum to 1, 100 or any other constant that would be equal between the rows. Perhaps this data is then a subcomposition, i.e., it is missing something. In that case better dataset might be needed for the notebook.

However, this data set looks like real world example as it is not curated to perfection, and thus, it would be good to think how to deal with these. Do we want the users be able to use log transforms with these non closed datasets or do we require them to fix their data to expected format beforehand?

If the constant sum check is one of the requirements, and we want user be able to deal also with data like IOCG_CLB_Till_Geochem_reg_511p.gpkg without preprocessing it, then I think that additional functionality is needed.
There could be for example argument data_is_closed = True for each of the log transformations and in the case user sets this False, closure operation _closure from aitchison_geometry.py would be run for the selected columns at the beginning of function. What do you think @chudasama-bijal @em-t ?

chudasama-bijal · 2024-12-03T19:24:16Z

I think we did discuss this, in practice the usual (geochem) data will be concentration data like the IOCG_CLB_Till_Geochem_reg_511p file, and it will often not satisfy the sum constant condition.

So, providing an option to the user whether to perform closure on their data seems the way to go about it. Could be demonstrated in the notebook itself.

Regarding the plr transformation, the ordering of the columns is important. Hence at least in the plugin, I would recommend it is emphasized that the column selections are made by the user in the order that they desire the plr to be performed upon. Will this require any changes in the toolkit function itself in terms of input parameters, or can this be handled in the plugin directly? Do consider it while modifying these functions.

Finally, if there are any queries regarding the math of the transformations that affect the coding aspect, then to me this doesn't seem the most effective channel to delve into it.

…columnsattributes-of-the-input-data

msorvoja · 2024-12-19T10:22:44Z

Hi @em-t, @jtlait,
I've added a scale parameter for users to perform closure on the selected columns in the coda functions and fixed the notebook (sorry for the commit spam, btw). The tools and the corresponding CLI function should work just fine but let me know if you spot anything!

As for @em-t's earlier comments about setting limitations to scale parameter in the inverse transformation functions as well and keeping the original columns of the input dataframes alongside the transformed columns, I don't have an answer as I'm not familiar with the actual use cases either.

Previously, if denominator column was not in selected columns, it was removed before the transformation.

msorvoja added 4 commits November 13, 2024 12:07

Refactor(CLR): Add option to select columns

12fe14e

Edit docstrings

e2651d5

refactor(ALR): Add option to select columns

64d7bad

refactor(PLR): Add option to select columns

467d201

msorvoja linked an issue Nov 14, 2024 that may be closed by this pull request

Coda transformations - Add selection of columns/attributes of the input data #447

Open

Run pre-commit

869ba24

msorvoja changed the title ~~447 coda transformations add selection of columnsattributes of the input data~~ 447 coda transformations add selection of columns/attributes of the input data Nov 18, 2024

msorvoja added 2 commits November 18, 2024 08:06

Merge master into branch

1e61ce9

Fix(clr_transform_cli): make columns optional parameter

b0c5659

msorvoja added 3 commits November 18, 2024 11:02

feat(plr_transform_cli): add columns parameters

f546cfd

fix(ALR): Perform check_in_simplex_sample_space after selecting columns

8ce165f

fix(CLR): Perform check_in_simplex_sample_space after selecting columns

da02d54

em-t requested changes Nov 19, 2024

View reviewed changes

fix(PLR): Perform check_in_simplex_space after selecting columns

72deedb

msorvoja mentioned this pull request Nov 25, 2024

Add column selection parameter to coda processing algorithms GispoCoding/eis_qgis_plugin#282

Open

msorvoja added 8 commits December 17, 2024 09:11

feat(closure): add function for performing closure on DataFrame

cdf056e

feat(PLR): add parameter for performing closure on the input DataFrame

dacd830

fix(closure): improve docstring

5f47ff2

feat(ALR): add closure_target parameter

33f4d78

feat(CLR): add closure_target parameter

3bcfedd

feat(ILR): add closure_target parameter

8f1679d

Fix parameter type for closure_target

85064f2

Select subcomposition columns if closure is performed

011a7e3

msorvoja added 12 commits December 18, 2024 08:30

Fix notebook

f47b897

Remove duplicate closure function

0a18030

Improve documentation

94991cc

Merge branch 'master' into 447-coda-transformations-add-selection-of-…

d117ce4

…columnsattributes-of-the-input-data

Fix notebook

ffa84e6

Clean code

662e73e

Clean code

6f4469a

Select columns before performing closure

a513418

Fix test

0000746

Add checks, fix logic, add tests

8e09947

Add scale parameter to CoDa CLI functions

3d56378

Fix notebook

1bf85a9

Fix missing denominator column in ALR

0026856

Previously, if denominator column was not in selected columns, it was removed before the transformation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

447 coda transformations add selection of columns/attributes of the input data #459

447 coda transformations add selection of columns/attributes of the input data #459

msorvoja commented Nov 14, 2024

jtlait commented Nov 18, 2024

em-t commented Nov 18, 2024

em-t left a comment

em-t Nov 19, 2024

chudasama-bijal commented Nov 20, 2024

nmaarnio commented Nov 29, 2024

msorvoja commented Nov 29, 2024

jtlait commented Nov 29, 2024

chudasama-bijal commented Dec 3, 2024

msorvoja commented Dec 19, 2024

447 coda transformations add selection of columns/attributes of the input data #459

Are you sure you want to change the base?

447 coda transformations add selection of columns/attributes of the input data #459

Conversation

msorvoja commented Nov 14, 2024

jtlait commented Nov 18, 2024

em-t commented Nov 18, 2024

em-t left a comment

Choose a reason for hiding this comment

em-t Nov 19, 2024

Choose a reason for hiding this comment

chudasama-bijal commented Nov 20, 2024

nmaarnio commented Nov 29, 2024

msorvoja commented Nov 29, 2024

jtlait commented Nov 29, 2024

chudasama-bijal commented Dec 3, 2024

msorvoja commented Dec 19, 2024