Run the notebooks in the exact order. The output files from each notebook is the input for the next notebook.
- Place an 'input.csv' file in the root folder. It should have 2 required columns which are "prompt" and "response".
- Run
spacy_classifiers.ipynb
. This will split the responses into clauses in the filevoice_classified.csv
- Run
abstraction_scores.ipynb
. This will split the responses into clauses in the fileabstraction_scored.csv
- Run
readability_scorer.ipynb
. This will split the responses into clauses in the filereadability_scored.csv
- Finally run
final_output_nb.ipynb
. This will produce two filesoutput.csv
anddebug.csv
. Output.csv is the minimal output which contains the split clauses, the final score and final voice based on maximum internal scores. Debug.csv contains a bit more granular details and scores of each internal terms.
Steps to set up an internal tool are:
- Clone this project. Move to the project folder.
- Run
pip install -r requirements.txt
- Run
python -c "import nltk; nltk.download('wordnet'); nltk.download('stopwords'); nltk.download('punkt');"
- Run
python -m spacy download <model-type>
. Model type can been_core_web_lg
,en_core_web_md
oren_core_web_sm
. - Run
python voice_identifier.py --help
to get started.