Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added github query engine! #50

Merged
merged 4 commits into from
Apr 23, 2024
Merged

feat: Added github query engine! #50

merged 4 commits into from
Apr 23, 2024

Conversation

amindadgar
Copy link
Member

@amindadgar amindadgar commented Apr 23, 2024

Summary by CodeRabbit

  • New Features

    • Introduced a CustomVectorStoreRetriever for improved handling of legacy and new versions in vector store queries.
    • Added a GitHubQueryEngine to manage GitHub-specific data queries alongside other sources.
  • Bug Fixes

    • Updated node retrieval and scoring process in the custom retriever to enhance accuracy and performance.
  • Tests

    • Added tests for the new BaseEngine and GitHubQueryEngine classes to ensure their correct setup and functionality.
  • Refactor

    • Enhanced LevelBasedPlatformQueryEngine to inherit from new base classes, streamlining its setup and functionality.

Copy link
Contributor

coderabbitai bot commented Apr 23, 2024

Walkthrough

The changes involve enhancing the system's query capabilities by introducing a custom retriever for handling vector store queries efficiently and integrating a new GitHubQueryEngine to support GitHub queries. Test cases have been added to validate the new engine functionalities, ensuring robustness across different sources.

Changes

Files Changes
.../custom_retriever.py, .../github.py Introduced classes for efficient data retrieval from vector stores and GitHub, with error handling and custom retrievers.
subquery.py, .../__init__.py Integrated GitHubQueryEngine into the system and updated imports to include new query engine functionalities.
.../test_base_engine.py, .../test_github_query_engine.py Added test cases to validate the functionality of the new base and GitHub query engines.
.../base_engine.py, .../level_based_platform_query_engine.py Developed a base engine class for setting up vector stores and modified existing engines to inherit new functionalities.
celery_app/tasks.py Added the github=True parameter in the ask_question_auto_search function call.

Poem

🐰✨
In the realm of code and query,
A rabbit hops, agile and merry.
Vectors stored, GitHub embraced,
Bugs squashed, no challenge misplaced.
Cheers to changes, swift and bright,
CodeRabbit leaps with delight!
🌟📜


Recent Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 261c8af and 25242db.
Files selected for processing (2)
  • celery_app/tasks.py (1 hunks)
  • subquery.py (3 hunks)
Files skipped from review as they are similar to previous changes (1)
  • subquery.py
Additional comments not posted (1)
celery_app/tasks.py (1)

78-78: Ensure that the query_multiple_source function is updated to handle the new github parameter effectively.

This change implies that the function now supports querying GitHub data. Please verify that the query_multiple_source function has been appropriately modified to handle this new parameter and that it integrates well with the existing query mechanisms.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Out of diff range and nitpick comments (5)
tests/unit/test_github_query_engine.py (2)

14-14: Consider removing the print statement from the test method to avoid cluttering the test output.


15-15: Ensure that the test checks more properties of GitHubQueryEngine to verify its correct setup and functionality.

tests/unit/test_base_engine.py (1)

8-22: Consider enhancing the test by verifying that the index is loaded correctly and that no exceptions are thrown during the setup.

utils/query_engine/github.py (1)

11-30: The implementation of GitHubQueryEngine is robust and aligns with the design requirements. Consider adding more detailed documentation for the prepare method to explain its process and components.

bot/retrievers/custom_retriever.py (1)

16-59: The _build_node_list_from_query_result method is correctly implemented with robust error handling. Consider adding more detailed comments to explain the logic, especially for handling different node types.

@amindadgar amindadgar requested a review from TjitsevdM April 23, 2024 07:16
@amindadgar
Copy link
Member Author

The issue given from codeClimate is just for the copied code from llama-index library. We would skip it.

subquery.py Outdated
github_query_engine = GitHubQueryEngine(community_id=community_id).prepare()
tool_metadata = ToolMetadata(
name="GitHub",
description="Hosts code repositories and project materials from the GitHub platform.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only access the conversations about the code (PR, issues) and not the actual code. Maybe that should be reflected in the description?

"Hosts conversations from Github issues and push requests from the selected repositories"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're processing commits too, maybe updating that to this could be better?

"Hosts commits and conversations from Github issues and pull requests from the selected repositories"

Copy link

@TjitsevdM TjitsevdM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most if this looks good. I only had a small comment about the platform description. I'll already approve to not hold you up.

@amindadgar amindadgar merged commit 21deca4 into main Apr 23, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants