Unify variable name and use gpt-4o for cheaper runs #556

kongzii · 2024-11-16T04:19:01Z

No description provided.

coderabbitai · 2024-11-16T04:19:08Z

Walkthrough

The pull request introduces several modifications primarily to the DeployablePredictionAgent and DeployableTraderAgent classes in the agent.py file, including renaming the store_prediction parameter to store_predictions. Additionally, the initialize_langfuse method is updated to automatically observe several methods. Changes are also made to the image_gen, is_invalid, and is_predictable modules, mainly updating model versions and adjusting parameters. Test cases in test_is_predictable.py are modified to reflect these changes, particularly in expected outputs.

Changes

File	Change Summary
`prediction_market_agent_tooling/deploy/agent.py`	Renamed `store_prediction` to `store_predictions` in constructors and internal assignments for `DeployablePredictionAgent` and `DeployableTraderAgent`. Updated `after_process_market` method to reflect new naming. Updated `initialize_langfuse` to automatically observe methods.
`prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py`	Changed model parameter in `ChatOpenAI` from `"gpt-4-turbo"` to `"gpt-4o-2024-08-06"` in `rewrite_question_into_image_generation_prompt`.
`prediction_market_agent_tooling/tools/is_invalid.py`	Updated `engine` parameter's default value from `"gpt-4o"` to `"gpt-4o-2024-08-06"` in `is_invalid` function.
`prediction_market_agent_tooling/tools/is_predictable.py`	Updated `engine` parameter's default value from `"gpt-4-1106-preview"` to `"gpt-4o-2024-08-06"` in `is_predictable_binary` and `is_predictable_without_description`. Introduced `seed` parameter in `ChatOpenAI` instantiation.
`tests/tools/test_is_predictable.py`	Modified expected output for the test case in `test_is_predictable_without_description` related to the question about AI and Math Olympiad.

Possibly related PRs

Updating langfuse integration #345: Changes in the DeployableAgent and DeployableTraderAgent classes relate to the initialize_langfuse method and enhancements to agent interactions.
Split DeployableTraderAgent into DeployablePredictionAgent #525: Significant modifications to DeployableTraderAgent, including renaming and restructuring methods, directly relate to changes in the main PR.
Added limit as a configurable variable to DeployablePredictionAgent #532: Introduction of n_markets_to_fetch variable in DeployablePredictionAgent is relevant to the main PR's focus on this class.
Expose more DeployablePredictionAgent.get_markets args as properties #533: Modifications to expose more arguments of the get_markets method in DeployablePredictionAgent relate directly to the main PR.
Fix is_invalid edge-case #539: Edge-case fixes for the is_invalid function are relevant to the validation logic impacted by the main PR.

Suggested reviewers

evangriffiths

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)

prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py (1)
Line range hint 26-29: Consider enhancing the prompt engineering

The current prompt could be more specific about the desired image characteristics to get better results from DALL-E-3.

Consider this enhancement:
-            f"Rewrite this prediction market question '{question}' into a form that will generate nice thumbnail with DALL-E-3."
-            "The thumbnail should be catchy and visually appealing. With a large object in the center of the image.",
+            f"Rewrite this prediction market question '{question}' into a DALL-E-3 image prompt that will generate a "
+            "professional thumbnail. The prompt should specify: 1) A clear focal point in the center, "
+            "2) Clean, minimalist composition, 3) Neutral background, 4) Professional lighting. "
+            "The image should be instantly recognizable and suitable for a prediction market interface.",
prediction_market_agent_tooling/tools/is_predictable.py (1)

104-104: Good addition of deterministic seeding

Adding the seed parameter is a good practice for:

Ensuring reproducible outputs

Reducing costs by enabling result caching

Facilitating testing and debugging

The implementation is consistent across both functions.

Consider documenting the seeding behavior in the function docstrings, especially noting any limitations or expectations around deterministic outputs with the new model version.

Also applies to: 144-144

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f30807a and f37dfd4.

⛔ Files ignored due to path filters (1)

pyproject.toml is excluded by !**/*.toml

📒 Files selected for processing (5)

prediction_market_agent_tooling/deploy/agent.py (3 hunks)
prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py (1 hunks)
prediction_market_agent_tooling/tools/is_invalid.py (1 hunks)
prediction_market_agent_tooling/tools/is_predictable.py (5 hunks)
tests/tools/test_is_predictable.py (1 hunks)

🔇 Additional comments (5)

tests/tools/test_is_predictable.py (1)

87-87: LGTM! The test case correction improves accuracy.

The change from True to False is correct since the question's predictability depends on external validation (Eliezer or Paul's acknowledgment) specified in the description, making it unpredictable without this context.

Let's verify the test coverage for similar cases:

✅ Verification successful

The test case change is correct and consistent with other test cases

The verification confirms that the change from True to False is appropriate. Looking at the test cases:

The US Presidential Election case is marked as predictable (True) because its resolution criteria are objective and based on Electoral College votes or Congressional procedure
The AI IMO case is correctly marked as unpredictable (False) because it depends on subjective acknowledgment from specific individuals (Eliezer or Paul)

This pattern is consistent across the test suite, where questions with objective resolution criteria are marked as predictable, while those requiring subjective validation are marked as unpredictable.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other test cases that might need similar adjustments
# where predictability depends on external validation or acknowledgment

# Search for similar patterns in test files
rg -i "acknowledge|validate|confirm" "tests/" -A 5 -B 5

# Search for the is_predictable implementation to understand the logic
ast-grep --pattern 'def is_predictable_without_description($$$)'

Length of output: 4219

Script:

#!/bin/bash
# Let's check the implementation of is_predictable_without_description and related functions
ast-grep --pattern 'def is_predictable($$$)'

# Also search for test cases with similar patterns of external validation
rg "Resolves to" tests/ -A 2 -B 2

Length of output: 1123

prediction_market_agent_tooling/tools/is_invalid.py (1)

63-63: Verify the specified model version and document the rationale.

The model version has been updated to a specific date-stamped version. While this aligns with the PR objective for cheaper runs, we should ensure:

This specific model version exists and is properly tested
The rationale for this specific version is documented

Consider adding a comment explaining why this specific model version was chosen, for example:

def is_invalid(
    question: str,
-    engine: str = "gpt-4o-2024-08-06",
+    # Using gpt-4o-2024-08-06 for optimal cost-performance ratio
+    engine: str = "gpt-4o-2024-08-06",
    temperature: float = LLM_SUPER_LOW_TEMPERATURE,

prediction_market_agent_tooling/tools/is_predictable.py (1)

10-13: LGTM: Import changes are well-organized

The addition of LLM_SEED import is properly grouped with related LLM configuration constants.

prediction_market_agent_tooling/deploy/agent.py (2)

446-452: LGTM! Logging statement updated correctly.

The logging statement has been properly updated to use the renamed variable.

511-516: LGTM! Constructor changes are consistent.

The parameter rename has been properly propagated to the child class constructor and super call.

coderabbitai · 2024-11-16T04:25:18Z

prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py

@@ -17,7 +17,7 @@ def rewrite_question_into_image_generation_prompt(question: str) -> str:
            "openai not installed, please install extras `langchain` to use this function."
        )
    llm = ChatOpenAI(
-        model="gpt-4-turbo",
+        model="gpt-4o-2024-08-06",


💡 Codebase verification

⚠️ Potential issue

Non-standard GPT model names require standardization across multiple files

The codebase contains several instances of non-standard OpenAI model names that need to be updated:

gpt-4o-2024-08-06 in market_thumbnail_gen.py

gpt-4o in relevant_news_analysis.py

Similar patterns likely exist in is_predictable.py and is_invalid.py (based on the regex search)

These non-standard model names will cause runtime errors as they don't match OpenAI's official model naming convention. Update all instances to use standard OpenAI model names like gpt-4-1106-preview or gpt-4-0613.

🔗 Analysis chain

Invalid OpenAI model name detected

The model name "gpt-4o-2024-08-06" appears to be non-standard and will likely cause runtime errors. OpenAI's official model names don't follow this format.

Apply this fix:

- model="gpt-4o-2024-08-06", + model="gpt-4-1106-preview", # Latest GPT-4 Turbo model

Let's verify if this non-standard model name is used elsewhere:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Search for similar non-standard model names across the codebase # Expected: No other occurrences of this pattern should exist echo "Searching for non-standard gpt-4o model names:" rg -l "gpt-4o-[0-9]{4}-[0-9]{2}-[0-9]{2}" echo -e "\nSearching for all GPT-4 model references for comparison:" rg -l "gpt-4"

Length of output: 834

Script:

#!/bin/bash # Let's check the actual model names used in these files echo "Checking model names in is_predictable.py:" rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_predictable.py -A 1 echo -e "\nChecking model names in is_invalid.py:" rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_invalid.py -A 1 echo -e "\nChecking model names in relevant_news_analysis.py:" rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_analysis.py -A 1 echo -e "\nChecking model names in utils.py:" rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/utils.py -A 1 echo -e "\nChecking model names in benchmark/utils.py:" rg "model\s*=.*gpt-4" prediction_market_agent_tooling/benchmark/utils.py -A 1

Length of output: 1048

Script:

#!/bin/bash # Let's check for any model variable assignments or configurations echo "Checking all model configurations in is_predictable.py:" rg "model.*=.*" prediction_market_agent_tooling/tools/is_predictable.py echo -e "\nChecking all model configurations in is_invalid.py:" rg "model.*=.*" prediction_market_agent_tooling/tools/is_invalid.py echo -e "\nChecking for any model configurations in utils.py:" rg "model.*=.*" prediction_market_agent_tooling/tools/utils.py echo -e "\nChecking for any model configurations in benchmark/utils.py:" rg "model.*=.*" prediction_market_agent_tooling/benchmark/utils.py # Let's also check for any configuration files that might define these models echo -e "\nChecking for potential config files:" fd -e yaml -e json -e toml -e ini

Length of output: 2114

Script:

#!/bin/bash # Let's check the content of pyproject.toml for any model configurations echo "Checking pyproject.toml for model configurations:" cat pyproject.toml # Let's also check if there's an 'engine' variable defined that might be setting these models echo -e "\nChecking for engine variable definitions:" rg "engine\s*=.*gpt" -A 2

Length of output: 2302

coderabbitai · 2024-11-16T04:25:18Z

prediction_market_agent_tooling/tools/is_predictable.py

@@ -81,7 +84,7 @@
 @db_cache
 def is_predictable_binary(
    question: str,
-    engine: str = "gpt-4-1106-preview",
+    engine: str = "gpt-4o-2024-08-06",


💡 Codebase verification

Invalid model identifier detected

The model identifier "gpt-4o-2024-08-06" appears to be non-standard and likely incorrect:

The format deviates from OpenAI's standard naming convention (no 'o' suffix is used in official models)

The date format used (2024-08-06) is in the future

There's no model validation or configuration that defines this as a valid model

The model is not documented in any configuration files

The previous model "gpt-4-1106-preview" should be retained or updated to a currently available OpenAI model.

🔗 Analysis chain

Verify the availability and compatibility of the new model version

The model has been changed from "gpt-4-1106-preview" to "gpt-4o-2024-08-06". Let's verify:

If this is a valid model identifier

If it maintains compatibility with the existing prompts and token limits

Also applies to: 125-125

Let me gather more information about model validation and configuration in the codebase.

Let me try one more search to find any model-related configurations or imports.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Search for model version references and verify consistency # Check for any documentation or configuration files that might list valid models fd -t f -e md -e yaml -e json . | xargs rg -l "gpt-4o-2024-08-06" # Check if there are any model-specific configurations or tests rg -A 5 "gpt-4o-2024-08-06" # Look for any model validation logic ast-grep --pattern 'if $model == "gpt-4o-2024-08-06" { $$$ }'

Length of output: 2379

Script:

#!/bin/bash # Check for model validation or configuration logic rg -l "SUPPORTED_MODELS|ALLOWED_MODELS|MODEL_CONFIG" . # Look for any model-related constants or configurations ast-grep --pattern 'MODEL = $value' ast-grep --pattern 'MODELS = $value' # Check for any OpenAI client initialization or model validation rg -A 5 "openai\..*Client"

Length of output: 162

Script:

#!/bin/bash # Search for OpenAI imports and configurations rg -l "from openai|import openai" # Look for any model-related constants rg "DEFAULT_MODEL|GPT_|MODEL_NAME" # Check configuration files fd -t f "config|settings" -e py -e yaml -e json | xargs cat

Length of output: 8456

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f37dfd4 and 5c051c2.

📒 Files selected for processing (1)

prediction_market_agent_tooling/deploy/agent.py (3 hunks)

🔇 Additional comments (1)

prediction_market_agent_tooling/deploy/agent.py (1)

533-540: LGTM!

The parameter rename is consistently applied in both the constructor declaration and the super() call.

coderabbitai · 2024-11-21T07:39:21Z

prediction_market_agent_tooling/deploy/agent.py

+        if self.store_predictions:
            market.store_prediction(
                processed_market=processed_market, keys=keys, agent_name=self.agent_name
            )
        else:
            logger.info(
-                f"Prediction {processed_market} not stored because {self.store_prediction=}."
+                f"Prediction {processed_market} not stored because {self.store_predictions=}."


⚠️ Potential issue

Fix inconsistent method name

While the parameter has been renamed to store_predictions, the method call still uses the old name store_prediction. This should be updated for consistency.

- market.store_prediction( + market.store_predictions( processed_market=processed_market, keys=keys, agent_name=self.agent_name )

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-21T07:39:21Z

prediction_market_agent_tooling/deploy/agent.py

+        store_predictions: bool = True,
    ) -> None:
        super().__init__(enable_langfuse=enable_langfuse)
-        self.store_prediction = store_prediction
+        self.store_predictions = store_predictions


💡 Codebase verification

Update required: Rename store_prediction method across market classes

The parameter rename from store_prediction to store_predictions in DeployablePredictionAgent is incomplete. Additional changes are needed:

The store_prediction method needs to be renamed to store_predictions in:

AgentMarket base class (agent_market.py)

MetaculusAgentMarket class (metaculus.py)

OmenAgentMarket class (omen.py)

The script scripts/store_prediction.py should be renamed to store_predictions.py

This ensures consistent naming throughout the codebase, as the functionality deals with storing multiple predictions.

🔗 Analysis chain

Verify parameter rename impact across the codebase

The rename from store_prediction to store_predictions improves naming consistency. However, based on past reviews, there are several places that still use store_prediction (singular) that need to be updated:

Method calls in agent.py

Method names in market classes (agent_market.py, omen.py, metaculus.py)

Script name and references in scripts/store_prediction.py

Let me check the specific occurrences of store_prediction to verify if they are related to the parameter rename.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check for any remaining instances of old parameter name # and verify all constructor calls are updated # Check for any remaining "store_prediction" instances rg "store_prediction" -l # Find all instantiations of DeployablePredictionAgent ast-grep --pattern 'DeployablePredictionAgent($$$)'

Length of output: 326

Script:

#!/bin/bash # Check the context of store_prediction usage in each file echo "=== agent.py ===" rg "store_prediction" prediction_market_agent_tooling/deploy/agent.py -B 2 -A 2 echo -e "\n=== agent_market.py ===" rg "store_prediction" prediction_market_agent_tooling/markets/agent_market.py -B 2 -A 2 echo -e "\n=== metaculus.py ===" rg "store_prediction" prediction_market_agent_tooling/markets/metaculus/metaculus.py -B 2 -A 2 echo -e "\n=== omen.py ===" rg "store_prediction" prediction_market_agent_tooling/markets/omen/omen.py -B 2 -A 2 echo -e "\n=== store_prediction.py ===" rg "store_prediction" scripts/store_prediction.py -B 2 -A 2

Length of output: 2355

Unify variable name and use gpt-4o for cheaper runs

f37dfd4

coderabbitai bot reviewed Nov 16, 2024

View reviewed changes

gabrielfior approved these changes Nov 19, 2024

View reviewed changes

Merge branch 'main' into peter/cchaper

5c051c2

coderabbitai bot reviewed Nov 21, 2024

View reviewed changes

kongzii merged commit 418e1dc into main Nov 21, 2024
16 checks passed

kongzii deleted the peter/cchaper branch November 21, 2024 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify variable name and use gpt-4o for cheaper runs #556

Unify variable name and use gpt-4o for cheaper runs #556

kongzii commented Nov 16, 2024

coderabbitai bot commented Nov 16, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 16, 2024

coderabbitai bot Nov 16, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 21, 2024

coderabbitai bot Nov 21, 2024

Unify variable name and use gpt-4o for cheaper runs #556

Unify variable name and use gpt-4o for cheaper runs #556

Conversation

kongzii commented Nov 16, 2024

coderabbitai bot commented Nov 16, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 16, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 16, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 21, 2024

Choose a reason for hiding this comment

coderabbitai bot commented Nov 16, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)