Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

447 coda transformations add selection of columns/attributes of the input data #459

Draft
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

msorvoja
Copy link
Collaborator

Fixes #447

I did not modify Single ILR as subcomposition parameters are already required.

And about PLR: I edited it so that if user selects some columns for the transformation, the numerator is placed on the first column and the selected columns on its right side so that the algorithm works. I'm not certain if it causes any errors in the results. Could @jtlait or @em-t verify whether this is okay or not?

@msorvoja msorvoja changed the title 447 coda transformations add selection of columnsattributes of the input data 447 coda transformations add selection of columns/attributes of the input data Nov 18, 2024
@jtlait
Copy link
Collaborator

jtlait commented Nov 18, 2024

I am not too familiar with PLR, so maybe @em-t or @chudasama-bijal can comment on this. About another thing regarding column selection, check_in_simplex_sample_space cannot be run at first when columns are selected. If input dataframe contains some additional variables besides geochemical variables and only geochemical variables are chosen, then check_in_simplex_sample_space should be run only for the selected geochemical data.

@em-t
Copy link
Collaborator

em-t commented Nov 18, 2024

I started to look at this today, but didn't quite have time for a proper review. I'll return to this tomorrow!

Copy link
Collaborator

@em-t em-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested through the functions, and looks good to me 👍

I noticed that the demo notebook (testing_logratio_transformations.ipynb) is out of date and running some parts cause errors. (Eg. inverse_alr is called with the wrong params.) All of the issues don't originate from this PR, but maybe they could be fixed here (or the notebook can be deleted).

Another thing unrelated to these changes - I was wondering, if we're limiting the composition total to 1 or 100, should the scale numbers in the inverse functions be limited to those values as well?

eis_toolkit/transformations/coda/plr.py Outdated Show resolved Hide resolved
@@ -3048,7 +3049,9 @@ def alr_transform_cli(
df = pd.DataFrame(gdf.drop(columns="geometry"))
typer.echo("Progress: 25%")

out_df = alr_transform(df=df, column=column, keep_denominator_column=keep_denominator_column)
out_df = alr_transform(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This applies to the other transforms as well.)

From the perspective of the QGIS plugin, would it make sense that when given a subcomposition (ie. certain columns to use), the resulting df would be combined back with the other columns in the data? Or is that something that can be implemented on the plugin side easily?

I don't know much about the actual use cases, so I'm not sure whether the user will typically want to keep working with the CoDa data separately, but I would assume it's more convenient alongside the rest of the data.

@chudasama-bijal
Copy link
Collaborator

I will take a look at the coda related issues next week.

@nmaarnio
Copy link
Collaborator

Did you want to check this @chudasama-bijal ?

@msorvoja
Copy link
Collaborator Author

@em-t @jtlait I tried to fix the notebook (testing_logratio_transformations.ipynb) but I believe we need compositional example data. The IOCG_CLB_Till_Geochem_reg_511p.shp dataset used in the "Testing with example data" section of the notebook is not compositional (at least the rows don't sum to 1 or 100). I'm not sure if we have suitable data for this: do you guys have an example dataset that could be used in the notebook? And can you fix it?

@jtlait
Copy link
Collaborator

jtlait commented Nov 29, 2024

I couldn't find the .shp version of the file, but I did look the IOCG_CLB_Till_Geochem_reg_511p.gpkg.
In the .gpkg file the analyzed elements (columns with ppm) did not sum to 1, 100 or any other constant that would be equal between the rows. Perhaps this data is then a subcomposition, i.e., it is missing something. In that case better dataset might be needed for the notebook.

However, this data set looks like real world example as it is not curated to perfection, and thus, it would be good to think how to deal with these. Do we want the users be able to use log transforms with these non closed datasets or do we require them to fix their data to expected format beforehand?

If the constant sum check is one of the requirements, and we want user be able to deal also with data like IOCG_CLB_Till_Geochem_reg_511p.gpkg without preprocessing it, then I think that additional functionality is needed.
There could be for example argument data_is_closed = True for each of the log transformations and in the case user sets this False, closure operation _closure from aitchison_geometry.py would be run for the selected columns at the beginning of function. What do you think @chudasama-bijal @em-t ?

@chudasama-bijal
Copy link
Collaborator

I think we did discuss this, in practice the usual (geochem) data will be concentration data like the IOCG_CLB_Till_Geochem_reg_511p file, and it will often not satisfy the sum constant condition.

So, providing an option to the user whether to perform closure on their data seems the way to go about it. Could be demonstrated in the notebook itself.

Regarding the plr transformation, the ordering of the columns is important. Hence at least in the plugin, I would recommend it is emphasized that the column selections are made by the user in the order that they desire the plr to be performed upon. Will this require any changes in the toolkit function itself in terms of input parameters, or can this be handled in the plugin directly? Do consider it while modifying these functions.

Finally, if there are any queries regarding the math of the transformations that affect the coding aspect, then to me this doesn't seem the most effective channel to delve into it.

@msorvoja
Copy link
Collaborator Author

Hi @em-t, @jtlait,
I've added a scale parameter for users to perform closure on the selected columns in the coda functions and fixed the notebook (sorry for the commit spam, btw). The tools and the corresponding CLI function should work just fine but let me know if you spot anything!

As for @em-t's earlier comments about setting limitations to scale parameter in the inverse transformation functions as well and keeping the original columns of the input dataframes alongside the transformed columns, I don't have an answer as I'm not familiar with the actual use cases either.

Previously, if denominator column was not in selected columns, it was removed before the transformation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coda transformations - Add selection of columns/attributes of the input data
5 participants