This project implements a candidate matching system that recommends suitable candidates for job openings. It leverages machine learning techniques to analyze historical hiring data and improve matching accuracy.
- Text Vectorization: Uses TF-IDF vectorization to represent job descriptions and candidate profiles as numerical features.
- Similarity Calculation: Employs cosine similarity to measure the similarity between job requirements and candidate qualifications based on their vector representations. (Initial approach focused on titles and skills, later expanded to include profile descriptions as well)
- Initially traditional EDA was done on both the dataset . Duplicates and non english words were removed (NLTK Libraries).
- Stopword removal and stemming can improve the accuracy of similarity comparisons (Porter's Stemmer).
- An adjustable threshold in the
create_similarity_dictionary
function allows us to control the level of strictness for considering job titles as similar (You may referpreprocessing/cleaning_recommendations.ipynb
). - Cleaned and merged dataset can be found in
data/cleaned_recommendation
. - Complete Inference has been done on the dataset and results can be found in
inference.ipynb
file.
You may indiviually run the End-to-End/Candidate_Matching.ipynb
file or run using streamlit application :
- First clone the repository :
git clone https://github.com/farvath/Candidate-Matching.git
- Replace the pre-processed data paths stored in pickle files (e.g., df.pkl and similarity.pkl), make sure these files are present in the same directory as your application (app.py).
- In your terminal, navigate to your project directory(app.py) and run the following command:
streamlit run app.py