Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polishing of CDI Utilities 1 #19

Open
3 tasks
adriansteffan opened this issue Dec 23, 2024 · 1 comment
Open
3 tasks

Polishing of CDI Utilities 1 #19

adriansteffan opened this issue Dec 23, 2024 · 1 comment

Comments

@adriansteffan
Copy link
Contributor

I have added a first version of utilities to deal with cdi data, but there are some areas where they could use polishing:

A) Validation of CDI data

  • there now exists the function cleanup_cdi_data that is supposed to catch anomalies that make analysis difficult. For now, it only catches duplicate cdi values for the same administration (we need a validator for this anyway - see Find the participant with duplicate cdi data peekbank-data-import#146) - we should brainstorm and implement other anomalies that we would like to catch here

B) Relative CDI scores

  • I took some code from the peekbank method repo that calculates a percentage of points reached out of the maximum for a given cdi type. However, I could use the eyes of someone experienced with cdi to determine the correct comparison values for various languages and cdi types.

C) Percentile calculation for CDI rawscores (benchmarking)

  • I have added a function that gets the respective cdi percentile for each rawscore loosely based on work done by george and the others https://github.com/kachergis/cdi-percentiles/tree/main. At the moment, this is hardcoded for American English and the 2022 norm, so we might want to add more languages and norms and make the function more flexible as a result. Additionally, this function could use some testing + a look by someone more experienced with cdis (open point: what to do with kids that are too young or old? right now, the function defaults to the nearest available age)
@mzettersten
Copy link
Contributor

@mcfrank Adrian has been working on CDI score handling, including computing CDI percentiles. Looping you in here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants