Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower amount of DB connections #547

Merged
merged 10 commits into from
Nov 11, 2024
Merged

Lower amount of DB connections #547

merged 10 commits into from
Nov 11, 2024

Conversation

kongzii
Copy link
Contributor

@kongzii kongzii commented Nov 7, 2024

deploy please

Copy link

coderabbitai bot commented Nov 7, 2024

Walkthrough

The pull request introduces changes to three files: config.py, db_cache.py, and relevant_news_cache.py. In config.py, the default value of the ENABLE_CACHE attribute in the APIKeys class is modified from True to False, disabling caching by default. In db_cache.py, a new parameter pool_size is added to the create_engine function call, and the cache saving logic is simplified. Additionally, engine.dispose() is called to manage database connections. Similar changes are made in relevant_news_cache.py, where the pool_size parameter is also added to the create_engine call.

Changes

File Change Summary
prediction_market_agent_tooling/config.py Updated ENABLE_CACHE from True to False in APIKeys class.
prediction_market_agent_tooling/tools/caches/db_cache.py Added pool_size=1 to create_engine call; simplified cache entry saving logic; added engine.dispose() to manage database connections.
prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_cache.py Added pool_size=1 to create_engine call in RelevantNewsResponseCache class.

Possibly related PRs

  • Initialize db_cache engine in wrapper #546: The changes in this PR involve the ENABLE_CACHE attribute from the APIKeys class, as it checks if caching is enabled before proceeding with caching operations in the db_cache decorator function.

Suggested reviewers

  • evangriffiths

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
prediction_market_agent_tooling/config.py (1)

56-57: Document the caching behavior change.

The change in default caching behavior should be documented for users of this library. Consider:

  1. Adding a note to the changelog
  2. Updating relevant documentation about caching configuration
  3. Adding migration instructions for users who might need to explicitly enable caching
prediction_market_agent_tooling/tools/caches/db_cache.py (1)

43-44: Consider thread-safety implications of global engine.

The global DB_CACHE_ENGINE introduces shared mutable state which could lead to race conditions in concurrent environments. While SQLAlchemy engines are thread-safe for connection pooling, the initialization check-then-set pattern could still race.

Consider:

  1. Adding documentation explaining the purpose and thread-safety guarantees
  2. Using a thread-safe initialization pattern like:
from threading import Lock

_engine_lock = Lock()
DB_CACHE_ENGINE: None | Engine = None

def get_engine() -> Engine:
    global DB_CACHE_ENGINE
    if DB_CACHE_ENGINE is None:
        with _engine_lock:
            if DB_CACHE_ENGINE is None:
                DB_CACHE_ENGINE = create_engine(...)
    return DB_CACHE_ENGINE
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dfa90ca and 730be04.

⛔ Files ignored due to path filters (1)
  • pyproject.toml is excluded by !**/*.toml
📒 Files selected for processing (2)
  • prediction_market_agent_tooling/config.py (1 hunks)
  • prediction_market_agent_tooling/tools/caches/db_cache.py (6 hunks)
🔇 Additional comments (2)
prediction_market_agent_tooling/config.py (1)

56-56: Verify the impact of disabling cache by default.

Changing ENABLE_CACHE default from True to False could impact performance in code paths that expect caching to be enabled. Please ensure this change is intentional and aligns with the introduction of the shared engine instance.

Let's verify the usage of this configuration:

Consider adding a comment explaining why caching is disabled by default, especially in relation to the shared engine instance changes. This helps future maintainers understand the design decision:

-    ENABLE_CACHE: bool = False
+    # Disabled by default as we're using a single shared database engine instance
+    # See db_cache.py for the caching implementation
+    ENABLE_CACHE: bool = False
✅ Verification successful

Based on the shell script results, I can now generate the final response:

Disabling cache by default is safe and aligned with test expectations

The codebase already handles both cached and non-cached scenarios properly:

  • The db_cache decorator explicitly checks api_keys.ENABLE_CACHE and falls back to non-cached execution
  • Tests explicitly verify the behavior with cache disabled (see test_db_cache.py)
  • The caching system is properly abstracted, making it easy to enable/disable without breaking functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find code that relies on ENABLE_CACHE configuration
# Expected results: Identify potential impact points where caching is assumed

# Search for direct references to ENABLE_CACHE
echo "Direct references to ENABLE_CACHE:"
rg "ENABLE_CACHE" -B 2 -A 2

# Search for cache-related function calls that might be affected
echo "\nCache-related function calls:"
rg "\.cache|cache\." -B 2 -A 2

Length of output: 9715

prediction_market_agent_tooling/tools/caches/db_cache.py (1)

Line range hint 43-211: Verify caching usage and performance impact.

The switch to a single shared engine instance could impact application performance and resource usage.

Let's analyze the caching usage:

✅ Verification successful

Let me gather more information about the caching implementation and its usage patterns.


Based on the comprehensive analysis of the codebase, I can now provide a final response:

Global DB engine instance is safe and follows best practices

The shared DB_CACHE_ENGINE implementation is correct and safe because:

  1. Each decorated function creates independent database sessions using with Session(DB_CACHE_ENGINE), ensuring thread-safety
  2. The codebase shows no concurrent access patterns that could cause issues with the shared engine
  3. The implementation is similar to other caching patterns in the codebase (e.g., RelevantNewsResponseCache)
  4. Test coverage demonstrates proper functionality across various scenarios including:
    • Multiple data types and Pydantic models
    • Cache invalidation
    • Concurrent access patterns
    • Error handling

The switch to a single shared engine instance is an improvement over creating multiple engines, as it:

  • Reduces resource overhead
  • Follows SQLAlchemy's connection pooling best practices
  • Maintains thread-safety through proper session management
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze cache usage patterns and potential impact

# Check for cache decorator usage
echo "Checking cache decorator usage patterns..."
rg -l "@db_cache" 

# Look for direct engine/session creation that might need consolidation
echo "Checking for other database engine instances..."
rg "create_engine|Session\(" --type py

# Check for potential concurrent access patterns
echo "Checking for concurrent execution patterns..."
rg -l "ThreadPoolExecutor|ProcessPoolExecutor|asyncio|concurrent" --type py

Length of output: 1731


Script:

#!/bin/bash
# Check the implementation details of files using db_cache

# Check test file for usage patterns
echo "=== Test file implementation ==="
cat tests/tools/test_db_cache.py

# Check actual usage in application code
echo -e "\n=== Google implementation ==="
cat prediction_market_agent_tooling/tools/google.py

echo -e "\n=== Tavily implementation ==="
cat prediction_market_agent_tooling/tools/tavily/tavily_search.py

# Check relevant_news_cache implementation for comparison
echo -e "\n=== Relevant news cache implementation ==="
cat prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_cache.py

Length of output: 20191

@kongzii kongzii marked this pull request as draft November 7, 2024 12:05
@@ -53,7 +53,7 @@ class APIKeys(BaseSettings):

SQLALCHEMY_DB_URL: t.Optional[SecretStr] = None

ENABLE_CACHE: bool = True
ENABLE_CACHE: bool = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default set to False, to not error out cached functions if postgres access isn't set.

@kongzii kongzii changed the title Single shared engine instance Fix multiprocessing Nov 10, 2024
@kongzii kongzii changed the title Fix multiprocessing Lower amount of DB connections Nov 10, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_cache.py (1)

28-33: LGTM! Consider additional connection pool configurations.

The addition of pool_size=1 aligns with the PR objective to reduce DB connections. However, consider adding these configurations for better connection management:

         self.engine = create_engine(
             (
                 sqlalchemy_db_url
                 if sqlalchemy_db_url
                 else APIKeys().sqlalchemy_db_url.get_secret_value()
             ),
-            pool_size=1,
+            pool_size=1,
+            pool_timeout=30,  # Timeout waiting for connections from pool
+            pool_recycle=3600,  # Recycle connections after 1 hour
+            pool_pre_ping=True,  # Verify connection validity before checkout
         )
+        # Add engine cleanup
+        import atexit
+        atexit.register(self.engine.dispose)

This ensures:

  1. Connections don't hang indefinitely waiting for the pool
  2. Stale connections are recycled
  3. Dead connections are detected before use
  4. Engine resources are properly cleaned up on application exit
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 730be04 and 15477b2.

📒 Files selected for processing (2)
  • prediction_market_agent_tooling/tools/caches/db_cache.py (3 hunks)
  • prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_cache.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • prediction_market_agent_tooling/tools/caches/db_cache.py

@gabrielfior gabrielfior merged commit 3c7e09a into main Nov 11, 2024
16 checks passed
@gabrielfior gabrielfior deleted the peter/fix-db-cache-2 branch November 11, 2024 12:25
gabrielfior added a commit that referenced this pull request Nov 19, 2024
#552)

* Lower amount of DB connections (#547)

* Single shared engine instance

* fixes

* revert accidental bump

* fix pickle in github ci

* fiiiix

* fix loguru

* revert

* ah

* ah2

* revert

* Added gnosis_rpc_config, google_credentials_filename and chain_id to BaseSettings

* Fixed black

* Fixed black (2) | changed rpc

* Raise OutOfFunds  in withdraw_wxdai_to_xdai_to_keep_balance (#553)

* Implemented PR comments

* Fixed CI

* Merged

---------

Co-authored-by: Peter Jung <[email protected]>
gabrielfior added a commit that referenced this pull request Nov 21, 2024
* Added block_time fetching for more accurate outcome tokens received.

* Financial metrics being published

* Reverted caching to diskcache

* Fixing CI

* lock file

* Updated network foundry

* Updated RPC to gateway

* Moved local_chain test to proper folder

* Fixed black

* Applied PR comments

* Small fixes after PR review

* Small refactoring to address PR comments

* Added gnosis_rpc_config, google_credentials_filename and chain_id to … (#552)

* Lower amount of DB connections (#547)

* Single shared engine instance

* fixes

* revert accidental bump

* fix pickle in github ci

* fiiiix

* fix loguru

* revert

* ah

* ah2

* revert

* Added gnosis_rpc_config, google_credentials_filename and chain_id to BaseSettings

* Fixed black

* Fixed black (2) | changed rpc

* Raise OutOfFunds  in withdraw_wxdai_to_xdai_to_keep_balance (#553)

* Implemented PR comments

* Fixed CI

* Merged

---------

Co-authored-by: Peter Jung <[email protected]>

---------

Co-authored-by: Peter Jung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants