Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some remarks on the full package PR #128

Open
Degoot-AM opened this issue Apr 4, 2024 · 1 comment
Open

Some remarks on the full package PR #128

Degoot-AM opened this issue Apr 4, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@Degoot-AM
Copy link
Contributor

Degoot-AM commented Apr 4, 2024

  1. In the Readme file, remove the first occurrence of standardize_col_names. It has bee defined twice.

  2. Standardize or standardise, be consistency. I prefer the first option.

  3. For readability, add an s to the remove_constant() to become remove_constants() to be parallel with remove_duplicates().

  4. Check the implementation of check_subject_ids() with range only. for now it gives an error, see the below example

    sim_ebola_data <- cleanepi::check_subject_ids(sim_ebola_data, target_columns = "case_id", range = c(1, 100000000))
    Found 1957 duplicated rows. Please consult the report for more details.
    Error in parse_vector(x, col_number(), na = na, locale = locale, trim_ws = trim_ws) : 
    is.character(x) is not TRUE
    
  5. On applying standardize_dates()function into our simulated Ebola dataset, if the parameter target_columns set toNULL`, the gender column (which has mixture of characters and integers) will also consider as a date column.

  6. In the reference page of the vignette, some functions appear more than one time.

  7. What if you include a function calculating age category from a given age column. I think this relevant to `{cleanepi}'s module that perform some transformations.

  8. Add a function to define/create new dictionary, so that end-user can define a new dictionary and utilize the add_to_dictionary() function to populate.

  9. Add function to convert age column at different time units, e.g., if the age is given in years but you want to do analysis in months. Such function can be part of the transformation module in package structure.

  10. Include span and check_date_sequence operations into clean_data function.

@Karim-Mane
Copy link
Member

Karim-Mane commented Apr 8, 2024

  1. In the Readme file, remove the first occurrence of standardize_col_names. It has bee defined twice.

    1. Standardize or standardise, be consistency. I prefer the first option.

    2. For readability, add an s to the remove_constant() to become remove_constants() to be parallel with remove_duplicates().

    3. Check the implementation of check_subject_ids() with range only. for now it gives an error, see the below example

      sim_ebola_data <- cleanepi::check_subject_ids(sim_ebola_data, target_columns = "case_id", range = c(1, 100000000))
      Found 1957 duplicated rows. Please consult the report for more details.
      Error in parse_vector(x, col_number(), na = na, locale = locale, trim_ws = trim_ws) : 
      is.character(x) is not TRUE
      

I have noticed this. The subject ids in that column are all numeric. I have introduced a coercion into character in the function. This will change once the review is done

5. On applying standardize_dates()`function into our simulated Ebola dataset, if the parameter target_columns set to`NULL`, the gender column (which has mixture of characters and integers) will also consider as a date column.

I have noticed this also in the case_id column. I will add a warring about the use of this function for now and encourage users to specify the target columns.

6. In the reference page of the vignette, some functions appear more than one time.

Yes - the first section of the references list all the function, including both exported and internal functions. The second section is where the functions are described.
Do you suggest to remove the first section?

7. What if you include  a function calculating age category from a given age column. I think this relevant to `{cleanepi}'s  module that perform some transformations.

The age calculation is now handle by the span() function. Do you mean to add an example, in the vignette, that specifically demonstrate how to calculate age.

8. Add a function to define/create new dictionary, so that end-user can define a new dictionary and utilize the `add_to_dictionary()` function to populate.

The dictionary file is just a data frame. By providing the template dictionary, users will be able to reproduce this with the data.frame() function for example. I think we can omit this function for now. If we requests about this in the future, then we can make the function.

9. Add function to convert age column at different time units, e.g., if the age is given in years but you want to do analysis in months. Such function can be part of the transformation module in package structure.

I don't think that we need this function. When the age is in years, you can multiply by 12 to get the age in months or by 365.25 to get the age in days. This is usually intuitive and users are generally able to figure this out with “no pain”

10. Include `span` and `check_date_sequence` operations into `clean_data` function.

Will add the check_date_sequence in clean_data(). I think that we don't need to add span given that the columns generated by span() needs to be named by the user to provide more context.

@Karim-Mane Karim-Mane self-assigned this Apr 9, 2024
@Karim-Mane Karim-Mane added the enhancement New feature or request label Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

2 participants