Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple_classify Method to DecisionLayer Class #1

Merged
merged 20 commits into from
Nov 9, 2023

Conversation

Siraj-Aizlewood
Copy link
Contributor

Description:

This pull request introduces a new method, simple_classify, to the DecisionLayer class. The simple_classify method takes a dictionary of query results and an optional boolean parameter apply_tan (defaulting to True), and returns the category with the highest total score, along with the scores of all categories.

The method works as follows:

  1. It applies a scoring system to the results from the _query method. If apply_tan is True, the score for each result is calculated as the tangent of the result's score times π/2. If apply_tan is False, the raw score is used.

  2. The scores are grouped by category. If a category appears multiple times in the results, the scores for that category are added together.

  3. The categories are sorted by total score in descending order.

  4. The method returns the category with the highest total score. If there are no results from _query, the method returns None.

This new method allows us to categorize a query based on the scores from _query, with the option to apply a tangent transformation to the scores. This can be useful when we want the scores to increase exponentially as the cosine similarity increases.

Siraj-Aizlewood and others added 16 commits November 6, 2023 13:49
Added simple_categorise which uses sum of Cosine Similarity Scores to determine Category. Option to use tan function to boost scores for closest points, and reduce scores for further away points.
It now takes a query result as an argument and outputs scores_by_category too, for debugging purposes.
Test cases aren't just the original Decision utterances now, but semantically similar utterances.

Also added new max_score_in_top_class method, which chooses the top score of the top scoring vector in the top class to compare to the threshold value.
Threshold checks done outside of _semantic_classify.

Testing more efficient as not using dl._query() accross every threshold.
These now match the number of non-other types.
Copy link
Member

@simjak simjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some small fixes

@jamescalam jamescalam merged commit 79834ad into main Nov 9, 2023
2 checks passed
@jamescalam jamescalam deleted the simple_classification branch November 9, 2023 11:24
jamescalam pushed a commit that referenced this pull request Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants