Some remarks on the full package PR #128

Degoot-AM · 2024-04-04T23:31:17Z

In the Readme file, remove the first occurrence of standardize_col_names. It has bee defined twice.
Standardize or standardise, be consistency. I prefer the first option.
For readability, add an s to the remove_constant() to become remove_constants() to be parallel with remove_duplicates().

Check the implementation of check_subject_ids() with range only. for now it gives an error, see the below example

sim_ebola_data <- cleanepi::check_subject_ids(sim_ebola_data, target_columns = "case_id", range = c(1, 100000000))
Found 1957 duplicated rows. Please consult the report for more details.
Error in parse_vector(x, col_number(), na = na, locale = locale, trim_ws = trim_ws) : 
is.character(x) is not TRUE

On applying standardize_dates()function into our simulated Ebola dataset, if the parameter target_columns set toNULL`, the gender column (which has mixture of characters and integers) will also consider as a date column.
In the reference page of the vignette, some functions appear more than one time.
What if you include a function calculating age category from a given age column. I think this relevant to `{cleanepi}'s module that perform some transformations.
Add a function to define/create new dictionary, so that end-user can define a new dictionary and utilize the add_to_dictionary() function to populate.
Add function to convert age column at different time units, e.g., if the age is given in years but you want to do analysis in months. Such function can be part of the transformation module in package structure.
Include span and check_date_sequence operations into clean_data function.

The text was updated successfully, but these errors were encountered:

Karim-Mane · 2024-04-08T16:41:10Z

In the Readme file, remove the first occurrence of standardize_col_names. It has bee defined twice.
Standardize or standardise, be consistency. I prefer the first option.

For readability, add an s to the remove_constant() to become remove_constants() to be parallel with remove_duplicates().
Check the implementation of check_subject_ids() with range only. for now it gives an error, see the below example
sim_ebola_data <- cleanepi::check_subject_ids(sim_ebola_data, target_columns = "case_id", range = c(1, 100000000))
Found 1957 duplicated rows. Please consult the report for more details.
Error in parse_vector(x, col_number(), na = na, locale = locale, trim_ws = trim_ws) : 
is.character(x) is not TRUE

I have noticed this. The subject ids in that column are all numeric. I have introduced a coercion into character in the function. This will change once the review is done

5. On applying standardize_dates()`function into our simulated Ebola dataset, if the parameter target_columns set to`NULL`, the gender column (which has mixture of characters and integers) will also consider as a date column.

I have noticed this also in the case_id column. I will add a warring about the use of this function for now and encourage users to specify the target columns.

6. In the reference page of the vignette, some functions appear more than one time.

Yes - the first section of the references list all the function, including both exported and internal functions. The second section is where the functions are described.
Do you suggest to remove the first section?

7. What if you include  a function calculating age category from a given age column. I think this relevant to `{cleanepi}'s  module that perform some transformations.

The age calculation is now handle by the span() function. Do you mean to add an example, in the vignette, that specifically demonstrate how to calculate age.

8. Add a function to define/create new dictionary, so that end-user can define a new dictionary and utilize the `add_to_dictionary()` function to populate.

The dictionary file is just a data frame. By providing the template dictionary, users will be able to reproduce this with the data.frame() function for example. I think we can omit this function for now. If we requests about this in the future, then we can make the function.

9. Add function to convert age column at different time units, e.g., if the age is given in years but you want to do analysis in months. Such function can be part of the transformation module in package structure.

I don't think that we need this function. When the age is in years, you can multiply by 12 to get the age in months or by 365.25 to get the age in days. This is usually intuitive and users are generally able to figure this out with “no pain”

10. Include `span` and `check_date_sequence` operations into `clean_data` function.

Will add the check_date_sequence in clean_data(). I think that we don't need to add span given that the columns generated by span() needs to be named by the user to provide more context.

Karim-Mane self-assigned this Apr 9, 2024

Karim-Mane added the enhancement New feature or request label Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some remarks on the full package PR #128

Some remarks on the full package PR #128

Degoot-AM commented Apr 4, 2024 •

edited

Loading

Karim-Mane commented Apr 8, 2024 •

edited

Loading

Some remarks on the full package PR #128

Some remarks on the full package PR #128

Comments

Degoot-AM commented Apr 4, 2024 • edited Loading

Karim-Mane commented Apr 8, 2024 • edited Loading

Degoot-AM commented Apr 4, 2024 •

edited

Loading

Karim-Mane commented Apr 8, 2024 •

edited

Loading