Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI] Identify similar BertTopics #29

Open
sajz opened this issue Nov 7, 2024 · 0 comments
Open

[AI] Identify similar BertTopics #29

sajz opened this issue Nov 7, 2024 · 0 comments
Assignees

Comments

@sajz
Copy link
Collaborator

sajz commented Nov 7, 2024

To handle cases where new topics from a message are similar to existing ones in the channel without creating duplicates, we can use a topic similarity threshold to decide if the new topic should merge with an existing topic or be created as a new one. Here’s a proposed approach:

Proposed Steps

  1. Compute Topic Similarity:

    • When BERT identifies a new topic in a message, compare this new topic’s semantic_vector with each existing topic in the channel’s ASSOCIATED_WITH relationships.
    • Use a similarity metric, such as cosine similarity, between the new topic’s vector and each existing topic’s vector.
  2. Set a Similarity Threshold:

    • Define a similarity threshold, e.g., 0.8, above which the new topic is considered “similar enough” to an existing topic. This threshold can be adjusted based on testing.
  3. Merge or Create Logic:

    • If Similarity is Above Threshold:
      • Merge the new topic with the existing topic that has the highest similarity score.
      • Update the existing topic’s overall_score using the amplify_score function based on the relevance of the new topic in the message.
    • If Similarity is Below Threshold for All Existing Topics:
      • Treat the new topic as distinct, create a new Topic node, and establish the ASSOCIATED_WITH relationship for tracking in this channel.
  4. Optional: Store Relatedness Data:

    • For transparency and future adjustments, record similarity data in the RELATED_TO relationship between topics. This way, if similar topics keep emerging, you can track these relationships for potential reorganization or clustering later.

Example Flow:

  1. Analyze New Message:

    • A new topic appears in the message with a semantic_vector.
  2. Similarity Comparison:

    • Compute cosine similarity between this new topic’s semantic_vector and each existing topic in the channel.
  3. Apply Threshold Decision:

    • Above Threshold (e.g., 0.8): Update the most similar existing topic’s score using amplify_score.
    • Below Threshold: Create a new topic entry and start tracking it as a distinct topic.
@sajz sajz added this to Concord Nov 7, 2024
@sajz sajz converted this from a draft issue Nov 7, 2024
@sajz sajz changed the title [AI] Identify similar BertTopic [AI] Identify similar BertTopics Nov 7, 2024
@sajz sajz self-assigned this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready
Development

No branches or pull requests

1 participant