Merge branch 'main' of https://github.com/Greyisheep/resume-ranking-a…

…ssessment
Greyisheep · Aug 29, 2024 · 7599903 · 7599903
2 parents ef131c5 + 7ef53b9
commit 7599903
Showing 1 changed file with 88 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -111,8 +111,95 @@ flowchart TD
     N --> O[End]
 ```
 
+## Reason for Choice of Models:
+
+###  `sshleifer/distilbart-cnn-12-6` (for CV Summarization)
+### Pros:
+1. **Efficiency**: `sshleifer/distilbart-cnn-12-6` is a distilled version of BART, which makes it smaller and faster than the original BART model (`facebook/bart-large-cnn`). This can lead to quicker inference times, especially beneficial when processing multiple CVs.
+
+2. **Good Performance**: Despite being a distilled model, `distilbart-cnn-12-6` retains much of the performance of its larger counterpart. It can generate coherent and relevant summaries, which is crucial for summarizing CVs effectively.
+
+3. **Reduced Resource Consumption**: The model requires less memory and computational power, making it easier to deploy in environments with limited resources or when scaling the application to handle many users simultaneously.
+
+4. **Compatibility with Hugging Face's Pipeline**: The model is easily integrable with Hugging Face's pipeline, allowing for straightforward implementation and fine-tuning if necessary.
+
+### Cons:
+1. **Limited Capacity**: As a smaller model, `distilbart-cnn-12-6` may not capture as much context or nuance as the full-sized BART model. This might lead to less detailed or accurate summaries, especially with complex or long CVs.
+
+2. **Potential for Truncation**: The model's input token limit (1024 tokens) might still be an issue for longer CVs, leading to truncation and potentially missing important information in the summaries. This is a trade-off between efficiency and coverage.
+
+3. **Reduced Flexibility in Summarization**: While the model is generally good at summarization, it might struggle with highly technical or domain-specific language present in CVs, which could result in less relevant summaries compared to a larger model fine-tuned specifically for this task.
+
+### `sentence-transformers/msmarco-distilbert-base-v4` (for CV Ranking)
+#### Pros:
+1. **Efficiency**: DistilBERT-based models like `msmarco-distilbert-base-v4` are lightweight and faster compared to larger models, making them suitable for applications where speed is critical.
+2. **Pre-trained on Relevant Data**: The `msmarco-distilbert-base-v4` model is fine-tuned on the MS MARCO dataset, which is designed for question-answering and information retrieval tasks. This makes it effective in ranking CVs based on similarity to a job description.
+3. **Good Accuracy**: Despite being lightweight, this model provides a good balance between accuracy and computational efficiency for ranking tasks.
+
+#### Cons:
+1. **Limited Expressiveness**: As a distilled model, it may lack some of the nuanced understanding that larger, more complex models can provide, especially in cases requiring deep semantic comprehension.
+2. **Potential for Missing Context**: While good for short queries, the model might struggle with capturing the full context of longer job descriptions or CVs.
+
+## Other Options Considered and Reason for not going with them:
+### **1. Ranking Alternative: `sentence-transformers/all-mpnet-base-v2`**
+#### **Cons:**
+- It requires more resources than `distilbert` models.
+- The model is larger in size, which might lead to slightly longer processing times and higher memory usage compared to `distilbert`.
+
+### **2. Summarization Alternative: `facebook/bart-large-cnn`**
+#### **Cons:**
+- `facebook/bart-large-cnn` has a maximum token limit (usually 1024 tokens), which can be a significant limitation when dealing with long CVs. This requires chunking the text, which might lead to loss of context and less accurate summaries.
+- BART is a large model, which makes it computationally expensive to run, especially if processing many CVs simultaneously. This can lead to longer processing times and higher resource usage.
+- The model might produce incoherent or incomplete summaries, especially when summarizing text chunks independently without considering the full document context.
+
+### **3. Ranking and Summarization Alternative: `longformer` (for Long Documents)**
+#### **Cons:**
+- `longformer` is more complex to fine-tune and may require more extensive training or adaptation to specific tasks compared to more standard models.
+-: Despite its efficiency with long texts, it still requires significant computational resources, especially when dealing with very long documents.
+- There might be fewer pre-trained versions fine-tuned specifically for tasks like summarization or ranking compared to models like BART, requiring more customization.
+
+
+## Reason for selecting these evaluation metrics
+
+### **1. Mean Reciprocal Rank (MRR)**
+#### **Pros:**
+- **Intuitive Interpretation**: MRR is simple to understand and calculate, providing a clear indication of how early in the ranked list the relevant items appear.
+- **Useful for Single Relevant Item**: MRR is particularly effective when there is only one relevant item per query, as it emphasizes the position of the first relevant item.
+- **Efficient Calculation**: MRR is computationally efficient, making it suitable for quick evaluations in applications where speed is essential.
+
+#### **Cons:**
+- **Single Relevant Item Focus**: MRR does not account for multiple relevant items within the ranked list; it only considers the rank of the first relevant item.
+- **Position Sensitivity**: MRR is highly sensitive to the position of the first relevant item but ignores the ranks of subsequent relevant items, potentially leading to a skewed evaluation if multiple relevant items exist.
+- **Binary Relevance Assumption**: MRR assumes a binary relevance (i.e., relevant or not), which might not fully capture the nuances of relevance in some contexts, such as ranking CVs by how well they match a job description.
+
+### **2. Normalized Discounted Cumulative Gain (NDCG) [Preferred]**
+#### **Pros:**
+- **Rank Position Sensitivity**: NDCG takes into account the position of relevant items in the ranked list, giving higher importance to items that appear earlier.
+- **Handles Multiple Relevant Items**: NDCG effectively handles cases where multiple relevant items are present, making it more versatile than MRR.
+- **Normalization**: The normalization aspect of NDCG allows for comparison across different queries or datasets, making it more robust in varied scenarios.
+
+#### **Cons:**
+- **Complexity**: NDCG is more complex to calculate compared to MRR, especially in cases with large datasets, which might require additional computational resources.
+- **Interpretability**: While NDCG is powerful, its interpretation is less intuitive than MRR, especially for non-experts, as it involves logarithmic discounting and normalization.
+- **Dependent on Relevance Scores**: NDCG relies on relevance scores for each item in the list, which means the effectiveness of NDCG can be impacted by the accuracy and reliability of these scores.
+
+### **3. BERTScore**
+#### **Pros:**
+- **Contextual Similarity**: BERTScore uses contextual embeddings from BERT, making it capable of capturing semantic similarity at a deeper level than traditional metrics like ROUGE or BLEU.
+- **Precision, Recall, F1**: BERTScore provides a detailed evaluation by offering precision, recall, and F1 scores, which gives a more comprehensive view of the generated text's quality.
+- **Robustness**: It is robust to variations in wording and synonyms, which makes it suitable for evaluating the quality of generated summaries, especially in contexts where exact word matches are not necessary.
+
+#### **Cons:**
+- **Computationally Intensive**: BERTScore is computationally heavy, as it requires running BERT or a similar model for each pair of sentences, making it slower and more resource-intensive than traditional metrics.
+- **Dependency on Pre-trained Models**: The quality of BERTScore depends heavily on the pre-trained model used, and its performance may vary depending on the model’s training data and domain.
+- **Less Interpretability**: While BERTScore is powerful, its results are less interpretable for non-experts, as they are based on complex embeddings rather than straightforward word overlaps.
+
+
+
+
+
 ## Future Improvements
 
 - Implement additional evaluation metrics.
 - Improve the robustness of the text extraction and cleaning processes.
-- Implement a good database structure with postgres.
+- Implement a good vector database structure with milvus or weaviate.