-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce top-level read_opus()
#27
Comments
read_opus
read_opus()
I think this is a great idea. Ultimately, this is a "driver" package, which means, in my view, that most users would really only interact with one function, If I think of a prototype for the function, this is what it would look like: res <- read_opus(
# Pass a vector of DSN, can be files or raw sources, can be more than one
dsn = my_dsn,
# Option to "simplify" the output when more than 1 DSN are passed to `dsn`.
# If used an "analysis-ready" matrix is returned. Similar to the `simplify` option in {opusreader}
matrix = FALSE,
# An option to limit the data returned to a specific type.
# When working with a lot of files and you only really want ScSm for example.
type = NULL,
# Use parallel read when more than 1 DSN are passed to `dsn`
parallel = TRUE,
# Something lower level to optionally tweak chunk size etc
# For power users only
parallel.options = list(chunk.size = 10, ...),
# Print a progress bar if more than 1 DSN are passed to `dsn`
progress = TRUE
) Something that is needed at the moment is to provide clarity about the different elements of data returned by the driver. Eg most users would likely be after the data contained in |
you are right, the output is kind of an issue. i think @philipp-baumann thought about a filter option. but i see what you mean, we could add a param = T/F option to get the full output or just the data... |
Yes, I imagine a short iteration of the above: Because
To grab specific elements, like What we also have to consider is a Regarding chunking: I would go with the registered cores. Parallel resource management should be done by the user. We can further simplify the function interface by adding one main parallel argument, like |
so a short update of the proposal, although we probably need to update again before we get to stable:
and with some more special features:
|
Let's put this |
Another iteration, based on the discussion with @ThomasKnecht . I think the previous draft was a bit too complicated
|
instead of ouput_datatype -> data_only = F |
+1 for this Mr. Propper. |
|
It;s unclear to me what |
|
@pierreroudier do you have a better name suggestion than |
I implemented the data_only as well as the parallel parts @philipp-baumann @pierreroudier what do you think of that? the data_only I also implemented in the read_opus_impl -> I think this is a nice to have in the base functionality anyways... I am not so happy about my implementation of the data_only. it seems rather complicated... maybe you have a nicer suggestion? |
I also fixed a bug in the prep_spectra function ;) |
Awesome, thanks for this! short comments:
|
Sorry, I am still a bit confused by I'd be tempted to leave such a functionality outside For |
Opinionated comment on the arg names -- I like to keep them short, eg |
Maybe it was a bit many things in one issue, or respectively the merge that followed ;-) Yes indeed, For |
Ok, got it. I think it would be best to develop the feature as a dedicated function, then once it's properly designed, consider its integration in
My argument for this one is that users will either want to have the full metadata (in which case a list shall be returned), or want to quickly assemble data for modelling, in which case a matrix shall be returned. For the latter, there is code pretty much ready to go in the old |
I generally like the matrix-support. To keep track of features and the development, I will move on with opening separate issue tickets. Like this it will be more simple to keep track and have a clean git workflow without too much wizardry. |
we removed the output_path. we want to make a separate function for writing the data into json files or whatever other format is possible |
@pierreroudier , as @ThomasKnecht indicated above, the design decision we are making is to keep the interface of |
see #47 for the solution |
#25
The text was updated successfully, but these errors were encountered: