Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧱 Helper to extract final spectra as matrix #36

Open
philipp-baumann opened this issue Sep 14, 2022 · 4 comments
Open

🧱 Helper to extract final spectra as matrix #36

philipp-baumann opened this issue Sep 14, 2022 · 4 comments

Comments

@philipp-baumann
Copy link
Member

as discussed #27 , now in a separate issue.

@pierreroudier
Copy link

Thanks for the dedicated issue.

In terms of user interface, I think this option should essentially be a switch between a list (with full metadata, which is currently returned to the user) and a matrix.

The rationale is:

  • a chemometrician wanting to pull spectra to either calibrate a model or generate predictions would call read_opus with data_only = TRUE so a matrix is returned and can be pre-processed directly.
  • a more advanced user might turn data_only to FALSE for more advanced checks on the data, and a more thorough look at the metadata -- and could of course use a lapply call on that list to combine whatever piece of the data is interesting.

Finally, I'd suggest to change the argument name, and use a simpler and more common matrix = TRUE|FALSE.

Example call:

# Select all OPUS files from a range of folders
opus_fns <- list.files("some/project/folder/", pattern = glob2rx("my_project-*.0"), full.names = TRUE)

# Directly read and assemble MIR matrix from the selected files
mir_mat <- read_opus(
  opus_fns,
  matrix = TRUE,
  progress = TRUE
)

# Quick plot
matplot(t(mir_mat))

@philipp-baumann
Copy link
Member Author

philipp-baumann commented Sep 15, 2022

Thanks for the dedicated issue.

In terms of user interface, I think this option should essentially be a switch between a list (with full metadata, which is currently returned to the user) and a matrix.

The rationale is:

* a chemometrician wanting to pull spectra to either calibrate a model or generate predictions would call `read_opus` with `data_only = TRUE` so a matrix is returned and can be pre-processed directly.

* a **more advanced user** might turn `data_only` to `FALSE` for more advanced checks on the data, and a more thorough look at the metadata -- and could of course use a `lapply` call on that list to combine whatever piece of the data is interesting.

Finally, I'd suggest to change the argument name, and use a simpler and more common matrix = TRUE|FALSE.

Example call:

# Select all OPUS files from a range of folders
opus_fns <- list.files("some/project/folder/", pattern = glob2rx("my_project-*.0"), full.names = TRUE)

# Directly read and assemble MIR matrix from the selected files
mir_mat <- read_opus(
  opus_fns,
  matrix = TRUE,
  progress = TRUE
)

# Quick plot
matplot(t(mir_mat))

Thanks for the nice summaries of use cases. I would not rename to matrix, because it hides the intent what the function does. Yes, it does return a matrix, but it is not clear that this switch is for all parameters and data vs. final spectra only. @pierreroudier @ThomasKnecht I would suggest to either make matrix_spectra or matrix_spec in case we moved away from data_only

@philipp-baumann
Copy link
Member Author

philipp-baumann commented Sep 15, 2022

Because it is quite an important argument for controlling read_opus() behavior and user experience, these two steps seem important:

  1. make it very clear in the argument name what the output is ( @pierreroudier above), and that only (final) spectra are returned vs. all data and parameters.
  2. document this argument really concisely.

@ThomasKnecht
Copy link
Collaborator

currently it is possible to only parse the data. the combination to a matrix should in my opinion be made in an extra function.

@philipp-baumann philipp-baumann changed the title Support data_only and return simplified matrix 🧱 Support data_only to return only spectra Dec 7, 2022
@philipp-baumann philipp-baumann changed the title 🧱 Support data_only to return only spectra 🧱 Helper to extract final spectra as matrix Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants