Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify variable name and use gpt-4o for cheaper runs #556

Merged
merged 2 commits into from
Nov 21, 2024
Merged

Conversation

kongzii
Copy link
Contributor

@kongzii kongzii commented Nov 16, 2024

No description provided.

Copy link

coderabbitai bot commented Nov 16, 2024

Walkthrough

The pull request introduces several modifications primarily to the DeployablePredictionAgent and DeployableTraderAgent classes in the agent.py file, including renaming the store_prediction parameter to store_predictions. Additionally, the initialize_langfuse method is updated to automatically observe several methods. Changes are also made to the image_gen, is_invalid, and is_predictable modules, mainly updating model versions and adjusting parameters. Test cases in test_is_predictable.py are modified to reflect these changes, particularly in expected outputs.

Changes

File Change Summary
prediction_market_agent_tooling/deploy/agent.py Renamed store_prediction to store_predictions in constructors and internal assignments for DeployablePredictionAgent and DeployableTraderAgent. Updated after_process_market method to reflect new naming. Updated initialize_langfuse to automatically observe methods.
prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py Changed model parameter in ChatOpenAI from "gpt-4-turbo" to "gpt-4o-2024-08-06" in rewrite_question_into_image_generation_prompt.
prediction_market_agent_tooling/tools/is_invalid.py Updated engine parameter's default value from "gpt-4o" to "gpt-4o-2024-08-06" in is_invalid function.
prediction_market_agent_tooling/tools/is_predictable.py Updated engine parameter's default value from "gpt-4-1106-preview" to "gpt-4o-2024-08-06" in is_predictable_binary and is_predictable_without_description. Introduced seed parameter in ChatOpenAI instantiation.
tests/tools/test_is_predictable.py Modified expected output for the test case in test_is_predictable_without_description related to the question about AI and Math Olympiad.

Possibly related PRs

Suggested reviewers

  • evangriffiths

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py (1)

Line range hint 26-29: Consider enhancing the prompt engineering

The current prompt could be more specific about the desired image characteristics to get better results from DALL-E-3.

Consider this enhancement:

-            f"Rewrite this prediction market question '{question}' into a form that will generate nice thumbnail with DALL-E-3."
-            "The thumbnail should be catchy and visually appealing. With a large object in the center of the image.",
+            f"Rewrite this prediction market question '{question}' into a DALL-E-3 image prompt that will generate a "
+            "professional thumbnail. The prompt should specify: 1) A clear focal point in the center, "
+            "2) Clean, minimalist composition, 3) Neutral background, 4) Professional lighting. "
+            "The image should be instantly recognizable and suitable for a prediction market interface.",
prediction_market_agent_tooling/tools/is_predictable.py (1)

104-104: Good addition of deterministic seeding

Adding the seed parameter is a good practice for:

  • Ensuring reproducible outputs
  • Reducing costs by enabling result caching
  • Facilitating testing and debugging

The implementation is consistent across both functions.

Consider documenting the seeding behavior in the function docstrings, especially noting any limitations or expectations around deterministic outputs with the new model version.

Also applies to: 144-144

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f30807a and f37dfd4.

⛔ Files ignored due to path filters (1)
  • pyproject.toml is excluded by !**/*.toml
📒 Files selected for processing (5)
  • prediction_market_agent_tooling/deploy/agent.py (3 hunks)
  • prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py (1 hunks)
  • prediction_market_agent_tooling/tools/is_invalid.py (1 hunks)
  • prediction_market_agent_tooling/tools/is_predictable.py (5 hunks)
  • tests/tools/test_is_predictable.py (1 hunks)
🔇 Additional comments (5)
tests/tools/test_is_predictable.py (1)

87-87: LGTM! The test case correction improves accuracy.

The change from True to False is correct since the question's predictability depends on external validation (Eliezer or Paul's acknowledgment) specified in the description, making it unpredictable without this context.

Let's verify the test coverage for similar cases:

✅ Verification successful

The test case change is correct and consistent with other test cases

The verification confirms that the change from True to False is appropriate. Looking at the test cases:

  • The US Presidential Election case is marked as predictable (True) because its resolution criteria are objective and based on Electoral College votes or Congressional procedure
  • The AI IMO case is correctly marked as unpredictable (False) because it depends on subjective acknowledgment from specific individuals (Eliezer or Paul)

This pattern is consistent across the test suite, where questions with objective resolution criteria are marked as predictable, while those requiring subjective validation are marked as unpredictable.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other test cases that might need similar adjustments
# where predictability depends on external validation or acknowledgment

# Search for similar patterns in test files
rg -i "acknowledge|validate|confirm" "tests/" -A 5 -B 5

# Search for the is_predictable implementation to understand the logic
ast-grep --pattern 'def is_predictable_without_description($$$)' 

Length of output: 4219


Script:

#!/bin/bash
# Let's check the implementation of is_predictable_without_description and related functions
ast-grep --pattern 'def is_predictable($$$)'

# Also search for test cases with similar patterns of external validation
rg "Resolves to" tests/ -A 2 -B 2

Length of output: 1123

prediction_market_agent_tooling/tools/is_invalid.py (1)

63-63: Verify the specified model version and document the rationale.

The model version has been updated to a specific date-stamped version. While this aligns with the PR objective for cheaper runs, we should ensure:

  1. This specific model version exists and is properly tested
  2. The rationale for this specific version is documented

Consider adding a comment explaining why this specific model version was chosen, for example:

def is_invalid(
    question: str,
-    engine: str = "gpt-4o-2024-08-06",
+    # Using gpt-4o-2024-08-06 for optimal cost-performance ratio
+    engine: str = "gpt-4o-2024-08-06",
    temperature: float = LLM_SUPER_LOW_TEMPERATURE,
prediction_market_agent_tooling/tools/is_predictable.py (1)

10-13: LGTM: Import changes are well-organized

The addition of LLM_SEED import is properly grouped with related LLM configuration constants.

prediction_market_agent_tooling/deploy/agent.py (2)

446-452: LGTM! Logging statement updated correctly.

The logging statement has been properly updated to use the renamed variable.


511-516: LGTM! Constructor changes are consistent.

The parameter rename has been properly propagated to the child class constructor and super call.

@@ -17,7 +17,7 @@ def rewrite_question_into_image_generation_prompt(question: str) -> str:
"openai not installed, please install extras `langchain` to use this function."
)
llm = ChatOpenAI(
model="gpt-4-turbo",
model="gpt-4o-2024-08-06",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Non-standard GPT model names require standardization across multiple files

The codebase contains several instances of non-standard OpenAI model names that need to be updated:

  • gpt-4o-2024-08-06 in market_thumbnail_gen.py
  • gpt-4o in relevant_news_analysis.py
  • Similar patterns likely exist in is_predictable.py and is_invalid.py (based on the regex search)

These non-standard model names will cause runtime errors as they don't match OpenAI's official model naming convention. Update all instances to use standard OpenAI model names like gpt-4-1106-preview or gpt-4-0613.

🔗 Analysis chain

Invalid OpenAI model name detected

The model name "gpt-4o-2024-08-06" appears to be non-standard and will likely cause runtime errors. OpenAI's official model names don't follow this format.

Apply this fix:

-        model="gpt-4o-2024-08-06",
+        model="gpt-4-1106-preview",  # Latest GPT-4 Turbo model

Let's verify if this non-standard model name is used elsewhere:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for similar non-standard model names across the codebase
# Expected: No other occurrences of this pattern should exist

echo "Searching for non-standard gpt-4o model names:"
rg -l "gpt-4o-[0-9]{4}-[0-9]{2}-[0-9]{2}"

echo -e "\nSearching for all GPT-4 model references for comparison:"
rg -l "gpt-4"

Length of output: 834


Script:

#!/bin/bash
# Let's check the actual model names used in these files
echo "Checking model names in is_predictable.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_predictable.py -A 1

echo -e "\nChecking model names in is_invalid.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_invalid.py -A 1

echo -e "\nChecking model names in relevant_news_analysis.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_analysis.py -A 1

echo -e "\nChecking model names in utils.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/utils.py -A 1

echo -e "\nChecking model names in benchmark/utils.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/benchmark/utils.py -A 1

Length of output: 1048


Script:

#!/bin/bash
# Let's check for any model variable assignments or configurations
echo "Checking all model configurations in is_predictable.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/is_predictable.py

echo -e "\nChecking all model configurations in is_invalid.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/is_invalid.py

echo -e "\nChecking for any model configurations in utils.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/utils.py

echo -e "\nChecking for any model configurations in benchmark/utils.py:"
rg "model.*=.*" prediction_market_agent_tooling/benchmark/utils.py

# Let's also check for any configuration files that might define these models
echo -e "\nChecking for potential config files:"
fd -e yaml -e json -e toml -e ini

Length of output: 2114


Script:

#!/bin/bash
# Let's check the content of pyproject.toml for any model configurations
echo "Checking pyproject.toml for model configurations:"
cat pyproject.toml

# Let's also check if there's an 'engine' variable defined that might be setting these models
echo -e "\nChecking for engine variable definitions:"
rg "engine\s*=.*gpt" -A 2

Length of output: 2302

@@ -81,7 +84,7 @@
@db_cache
def is_predictable_binary(
question: str,
engine: str = "gpt-4-1106-preview",
engine: str = "gpt-4o-2024-08-06",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Invalid model identifier detected

The model identifier "gpt-4o-2024-08-06" appears to be non-standard and likely incorrect:

  1. The format deviates from OpenAI's standard naming convention (no 'o' suffix is used in official models)
  2. The date format used (2024-08-06) is in the future
  3. There's no model validation or configuration that defines this as a valid model
  4. The model is not documented in any configuration files

The previous model "gpt-4-1106-preview" should be retained or updated to a currently available OpenAI model.

🔗 Analysis chain

Verify the availability and compatibility of the new model version

The model has been changed from "gpt-4-1106-preview" to "gpt-4o-2024-08-06". Let's verify:

  1. If this is a valid model identifier
  2. If it maintains compatibility with the existing prompts and token limits

Also applies to: 125-125


Let me gather more information about model validation and configuration in the codebase.


Let me try one more search to find any model-related configurations or imports.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for model version references and verify consistency

# Check for any documentation or configuration files that might list valid models
fd -t f -e md -e yaml -e json . | xargs rg -l "gpt-4o-2024-08-06"

# Check if there are any model-specific configurations or tests
rg -A 5 "gpt-4o-2024-08-06"

# Look for any model validation logic
ast-grep --pattern 'if $model == "gpt-4o-2024-08-06" { $$$ }'

Length of output: 2379


Script:

#!/bin/bash
# Check for model validation or configuration logic
rg -l "SUPPORTED_MODELS|ALLOWED_MODELS|MODEL_CONFIG" .

# Look for any model-related constants or configurations
ast-grep --pattern 'MODEL = $value'
ast-grep --pattern 'MODELS = $value'

# Check for any OpenAI client initialization or model validation
rg -A 5 "openai\..*Client"

Length of output: 162


Script:

#!/bin/bash
# Search for OpenAI imports and configurations
rg -l "from openai|import openai"

# Look for any model-related constants
rg "DEFAULT_MODEL|GPT_|MODEL_NAME"

# Check configuration files
fd -t f "config|settings" -e py -e yaml -e json | xargs cat

Length of output: 8456

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f37dfd4 and 5c051c2.

📒 Files selected for processing (1)
  • prediction_market_agent_tooling/deploy/agent.py (3 hunks)
🔇 Additional comments (1)
prediction_market_agent_tooling/deploy/agent.py (1)

533-540: LGTM!

The parameter rename is consistently applied in both the constructor declaration and the super() call.

Comment on lines +458 to +464
if self.store_predictions:
market.store_prediction(
processed_market=processed_market, keys=keys, agent_name=self.agent_name
)
else:
logger.info(
f"Prediction {processed_market} not stored because {self.store_prediction=}."
f"Prediction {processed_market} not stored because {self.store_predictions=}."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent method name

While the parameter has been renamed to store_predictions, the method call still uses the old name store_prediction. This should be updated for consistency.

-            market.store_prediction(
+            market.store_predictions(
                 processed_market=processed_market, keys=keys, agent_name=self.agent_name
             )

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +303 to +306
store_predictions: bool = True,
) -> None:
super().__init__(enable_langfuse=enable_langfuse)
self.store_prediction = store_prediction
self.store_predictions = store_predictions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Update required: Rename store_prediction method across market classes

The parameter rename from store_prediction to store_predictions in DeployablePredictionAgent is incomplete. Additional changes are needed:

  • The store_prediction method needs to be renamed to store_predictions in:
    • AgentMarket base class (agent_market.py)
    • MetaculusAgentMarket class (metaculus.py)
    • OmenAgentMarket class (omen.py)
  • The script scripts/store_prediction.py should be renamed to store_predictions.py

This ensures consistent naming throughout the codebase, as the functionality deals with storing multiple predictions.

🔗 Analysis chain

Verify parameter rename impact across the codebase

The rename from store_prediction to store_predictions improves naming consistency. However, based on past reviews, there are several places that still use store_prediction (singular) that need to be updated:

  • Method calls in agent.py
  • Method names in market classes (agent_market.py, omen.py, metaculus.py)
  • Script name and references in scripts/store_prediction.py

Let me check the specific occurrences of store_prediction to verify if they are related to the parameter rename.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining instances of old parameter name
# and verify all constructor calls are updated

# Check for any remaining "store_prediction" instances
rg "store_prediction" -l

# Find all instantiations of DeployablePredictionAgent
ast-grep --pattern 'DeployablePredictionAgent($$$)'

Length of output: 326


Script:

#!/bin/bash
# Check the context of store_prediction usage in each file
echo "=== agent.py ==="
rg "store_prediction" prediction_market_agent_tooling/deploy/agent.py -B 2 -A 2

echo -e "\n=== agent_market.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/agent_market.py -B 2 -A 2

echo -e "\n=== metaculus.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/metaculus/metaculus.py -B 2 -A 2

echo -e "\n=== omen.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/omen/omen.py -B 2 -A 2

echo -e "\n=== store_prediction.py ==="
rg "store_prediction" scripts/store_prediction.py -B 2 -A 2

Length of output: 2355

@kongzii kongzii merged commit 418e1dc into main Nov 21, 2024
16 checks passed
@kongzii kongzii deleted the peter/cchaper branch November 21, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants