-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
447 coda transformations add selection of columns/attributes of the input data #459
base: master
Are you sure you want to change the base?
447 coda transformations add selection of columns/attributes of the input data #459
Conversation
I am not too familiar with PLR, so maybe @em-t or @chudasama-bijal can comment on this. About another thing regarding column selection, |
I started to look at this today, but didn't quite have time for a proper review. I'll return to this tomorrow! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested through the functions, and looks good to me 👍
I noticed that the demo notebook (testing_logratio_transformations.ipynb) is out of date and running some parts cause errors. (Eg. inverse_alr is called with the wrong params.) All of the issues don't originate from this PR, but maybe they could be fixed here (or the notebook can be deleted).
Another thing unrelated to these changes - I was wondering, if we're limiting the composition total to 1 or 100, should the scale numbers in the inverse functions be limited to those values as well?
@@ -3048,7 +3049,9 @@ def alr_transform_cli( | |||
df = pd.DataFrame(gdf.drop(columns="geometry")) | |||
typer.echo("Progress: 25%") | |||
|
|||
out_df = alr_transform(df=df, column=column, keep_denominator_column=keep_denominator_column) | |||
out_df = alr_transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This applies to the other transforms as well.)
From the perspective of the QGIS plugin, would it make sense that when given a subcomposition (ie. certain columns to use), the resulting df would be combined back with the other columns in the data? Or is that something that can be implemented on the plugin side easily?
I don't know much about the actual use cases, so I'm not sure whether the user will typically want to keep working with the CoDa data separately, but I would assume it's more convenient alongside the rest of the data.
I will take a look at the coda related issues next week. |
Did you want to check this @chudasama-bijal ? |
@em-t @jtlait I tried to fix the notebook (testing_logratio_transformations.ipynb) but I believe we need compositional example data. The |
I couldn't find the However, this data set looks like real world example as it is not curated to perfection, and thus, it would be good to think how to deal with these. Do we want the users be able to use log transforms with these non closed datasets or do we require them to fix their data to expected format beforehand? If the constant sum check is one of the requirements, and we want user be able to deal also with data like |
I think we did discuss this, in practice the usual (geochem) data will be concentration data like the IOCG_CLB_Till_Geochem_reg_511p file, and it will often not satisfy the sum constant condition. So, providing an option to the user whether to perform closure on their data seems the way to go about it. Could be demonstrated in the notebook itself. Regarding the plr transformation, the ordering of the columns is important. Hence at least in the plugin, I would recommend it is emphasized that the column selections are made by the user in the order that they desire the plr to be performed upon. Will this require any changes in the toolkit function itself in terms of input parameters, or can this be handled in the plugin directly? Do consider it while modifying these functions. Finally, if there are any queries regarding the math of the transformations that affect the coding aspect, then to me this doesn't seem the most effective channel to delve into it. |
…columnsattributes-of-the-input-data
Hi @em-t, @jtlait, As for @em-t's earlier comments about setting limitations to scale parameter in the inverse transformation functions as well and keeping the original columns of the input dataframes alongside the transformed columns, I don't have an answer as I'm not familiar with the actual use cases either. |
Previously, if denominator column was not in selected columns, it was removed before the transformation.
Fixes #447
I did not modify Single ILR as subcomposition parameters are already required.
And about PLR: I edited it so that if user selects some columns for the transformation, the numerator is placed on the first column and the selected columns on its right side so that the algorithm works. I'm not certain if it causes any errors in the results. Could @jtlait or @em-t verify whether this is okay or not?