Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve charge normalization #90

Open
nicolassaw opened this issue Jul 8, 2024 · 0 comments
Open

Improve charge normalization #90

nicolassaw opened this issue Jul 8, 2024 · 0 comments
Labels
cleaner cleaning and formatting up parsed data

Comments

@nicolassaw
Copy link
Contributor

Background

Charge normalization involves converting long specific strings of charge descriptions to simplified general terms for the charge.

For example, converting "DRIVING WHILE INTOXICATED BAC >=0.15" to "Driving Under the Influence of Alcohol".

Currently, this is taking place in the cleaner module. The simple process is to cross reference the complex text that is scraped against a database for the simple version. This database is a .json file like the one below.

[{"charge_name": "DRIVING WHILE INTOXICATED BAC >=0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >=.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC>=0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >=0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >= 0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC>=0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >=0.15-A", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC>=0.15 -A", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >=0.15- A (THP)", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC>= 0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >=0.15-A", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >= 0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC > =0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >=.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC >= 0.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC>.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC>=.15", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}
,{"charge_name": "DRIVING WHILE INTOXICATED BAC>=0.15-A", "uccs_code": "4020", "charge_desc": "Driving Under the Influence of Alcohol", "offense_category_desc": "Driving Under the Influence", "offense_type_desc": "DUI Offense"}]

Problem
However, because there are novel charges that can come through because the way the court clerk typed it was different from one day to the next, the cleaner often cannot find the generalized form of the charge. The charge can have addition fields like "(COUNT ONE) ARSON" that cause it to not match any fields related to Arson.

Goal
Rewrite the cleaner module so that is better at normalizing types of charges. This may involve something simple like removing certain types of formatting or as complex as using machine learning or AI.

@nicolassaw nicolassaw added the cleaner cleaning and formatting up parsed data label Jul 8, 2024
@nicolassaw nicolassaw moved this from 🔖 To-do to 🆕 Good First Ticket in Product - Indigent Defense Stats Jul 28, 2024
@newswim newswim moved this from 🆕 Good First Ticket to 🔖 To-do in Product - Indigent Defense Stats Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleaner cleaning and formatting up parsed data
Projects
Status: 🔖 To-do
Development

No branches or pull requests

1 participant