-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify variable name and use gpt-4o for cheaper runs #556
Conversation
WalkthroughThe pull request introduces several modifications primarily to the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py (1)
Line range hint
26-29
: Consider enhancing the prompt engineeringThe current prompt could be more specific about the desired image characteristics to get better results from DALL-E-3.
Consider this enhancement:
- f"Rewrite this prediction market question '{question}' into a form that will generate nice thumbnail with DALL-E-3." - "The thumbnail should be catchy and visually appealing. With a large object in the center of the image.", + f"Rewrite this prediction market question '{question}' into a DALL-E-3 image prompt that will generate a " + "professional thumbnail. The prompt should specify: 1) A clear focal point in the center, " + "2) Clean, minimalist composition, 3) Neutral background, 4) Professional lighting. " + "The image should be instantly recognizable and suitable for a prediction market interface.",prediction_market_agent_tooling/tools/is_predictable.py (1)
104-104
: Good addition of deterministic seedingAdding the
seed
parameter is a good practice for:
- Ensuring reproducible outputs
- Reducing costs by enabling result caching
- Facilitating testing and debugging
The implementation is consistent across both functions.
Consider documenting the seeding behavior in the function docstrings, especially noting any limitations or expectations around deterministic outputs with the new model version.
Also applies to: 144-144
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pyproject.toml
is excluded by!**/*.toml
📒 Files selected for processing (5)
prediction_market_agent_tooling/deploy/agent.py
(3 hunks)prediction_market_agent_tooling/tools/image_gen/market_thumbnail_gen.py
(1 hunks)prediction_market_agent_tooling/tools/is_invalid.py
(1 hunks)prediction_market_agent_tooling/tools/is_predictable.py
(5 hunks)tests/tools/test_is_predictable.py
(1 hunks)
🔇 Additional comments (5)
tests/tools/test_is_predictable.py (1)
87-87
: LGTM! The test case correction improves accuracy.
The change from True
to False
is correct since the question's predictability depends on external validation (Eliezer or Paul's acknowledgment) specified in the description, making it unpredictable without this context.
Let's verify the test coverage for similar cases:
✅ Verification successful
The test case change is correct and consistent with other test cases
The verification confirms that the change from True
to False
is appropriate. Looking at the test cases:
- The US Presidential Election case is marked as predictable (
True
) because its resolution criteria are objective and based on Electoral College votes or Congressional procedure - The AI IMO case is correctly marked as unpredictable (
False
) because it depends on subjective acknowledgment from specific individuals (Eliezer or Paul)
This pattern is consistent across the test suite, where questions with objective resolution criteria are marked as predictable, while those requiring subjective validation are marked as unpredictable.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other test cases that might need similar adjustments
# where predictability depends on external validation or acknowledgment
# Search for similar patterns in test files
rg -i "acknowledge|validate|confirm" "tests/" -A 5 -B 5
# Search for the is_predictable implementation to understand the logic
ast-grep --pattern 'def is_predictable_without_description($$$)'
Length of output: 4219
Script:
#!/bin/bash
# Let's check the implementation of is_predictable_without_description and related functions
ast-grep --pattern 'def is_predictable($$$)'
# Also search for test cases with similar patterns of external validation
rg "Resolves to" tests/ -A 2 -B 2
Length of output: 1123
prediction_market_agent_tooling/tools/is_invalid.py (1)
63-63
: Verify the specified model version and document the rationale.
The model version has been updated to a specific date-stamped version. While this aligns with the PR objective for cheaper runs, we should ensure:
- This specific model version exists and is properly tested
- The rationale for this specific version is documented
Consider adding a comment explaining why this specific model version was chosen, for example:
def is_invalid(
question: str,
- engine: str = "gpt-4o-2024-08-06",
+ # Using gpt-4o-2024-08-06 for optimal cost-performance ratio
+ engine: str = "gpt-4o-2024-08-06",
temperature: float = LLM_SUPER_LOW_TEMPERATURE,
prediction_market_agent_tooling/tools/is_predictable.py (1)
10-13
: LGTM: Import changes are well-organized
The addition of LLM_SEED
import is properly grouped with related LLM configuration constants.
prediction_market_agent_tooling/deploy/agent.py (2)
446-452
: LGTM! Logging statement updated correctly.
The logging statement has been properly updated to use the renamed variable.
511-516
: LGTM! Constructor changes are consistent.
The parameter rename has been properly propagated to the child class constructor and super call.
@@ -17,7 +17,7 @@ def rewrite_question_into_image_generation_prompt(question: str) -> str: | |||
"openai not installed, please install extras `langchain` to use this function." | |||
) | |||
llm = ChatOpenAI( | |||
model="gpt-4-turbo", | |||
model="gpt-4o-2024-08-06", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Non-standard GPT model names require standardization across multiple files
The codebase contains several instances of non-standard OpenAI model names that need to be updated:
gpt-4o-2024-08-06
in market_thumbnail_gen.pygpt-4o
in relevant_news_analysis.py- Similar patterns likely exist in is_predictable.py and is_invalid.py (based on the regex search)
These non-standard model names will cause runtime errors as they don't match OpenAI's official model naming convention. Update all instances to use standard OpenAI model names like gpt-4-1106-preview
or gpt-4-0613
.
🔗 Analysis chain
Invalid OpenAI model name detected
The model name "gpt-4o-2024-08-06" appears to be non-standard and will likely cause runtime errors. OpenAI's official model names don't follow this format.
Apply this fix:
- model="gpt-4o-2024-08-06",
+ model="gpt-4-1106-preview", # Latest GPT-4 Turbo model
Let's verify if this non-standard model name is used elsewhere:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for similar non-standard model names across the codebase
# Expected: No other occurrences of this pattern should exist
echo "Searching for non-standard gpt-4o model names:"
rg -l "gpt-4o-[0-9]{4}-[0-9]{2}-[0-9]{2}"
echo -e "\nSearching for all GPT-4 model references for comparison:"
rg -l "gpt-4"
Length of output: 834
Script:
#!/bin/bash
# Let's check the actual model names used in these files
echo "Checking model names in is_predictable.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_predictable.py -A 1
echo -e "\nChecking model names in is_invalid.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_invalid.py -A 1
echo -e "\nChecking model names in relevant_news_analysis.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_analysis.py -A 1
echo -e "\nChecking model names in utils.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/utils.py -A 1
echo -e "\nChecking model names in benchmark/utils.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/benchmark/utils.py -A 1
Length of output: 1048
Script:
#!/bin/bash
# Let's check for any model variable assignments or configurations
echo "Checking all model configurations in is_predictable.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/is_predictable.py
echo -e "\nChecking all model configurations in is_invalid.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/is_invalid.py
echo -e "\nChecking for any model configurations in utils.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/utils.py
echo -e "\nChecking for any model configurations in benchmark/utils.py:"
rg "model.*=.*" prediction_market_agent_tooling/benchmark/utils.py
# Let's also check for any configuration files that might define these models
echo -e "\nChecking for potential config files:"
fd -e yaml -e json -e toml -e ini
Length of output: 2114
Script:
#!/bin/bash
# Let's check the content of pyproject.toml for any model configurations
echo "Checking pyproject.toml for model configurations:"
cat pyproject.toml
# Let's also check if there's an 'engine' variable defined that might be setting these models
echo -e "\nChecking for engine variable definitions:"
rg "engine\s*=.*gpt" -A 2
Length of output: 2302
@@ -81,7 +84,7 @@ | |||
@db_cache | |||
def is_predictable_binary( | |||
question: str, | |||
engine: str = "gpt-4-1106-preview", | |||
engine: str = "gpt-4o-2024-08-06", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Invalid model identifier detected
The model identifier "gpt-4o-2024-08-06" appears to be non-standard and likely incorrect:
- The format deviates from OpenAI's standard naming convention (no 'o' suffix is used in official models)
- The date format used (2024-08-06) is in the future
- There's no model validation or configuration that defines this as a valid model
- The model is not documented in any configuration files
The previous model "gpt-4-1106-preview" should be retained or updated to a currently available OpenAI model.
🔗 Analysis chain
Verify the availability and compatibility of the new model version
The model has been changed from "gpt-4-1106-preview" to "gpt-4o-2024-08-06". Let's verify:
- If this is a valid model identifier
- If it maintains compatibility with the existing prompts and token limits
Also applies to: 125-125
Let me gather more information about model validation and configuration in the codebase.
Let me try one more search to find any model-related configurations or imports.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for model version references and verify consistency
# Check for any documentation or configuration files that might list valid models
fd -t f -e md -e yaml -e json . | xargs rg -l "gpt-4o-2024-08-06"
# Check if there are any model-specific configurations or tests
rg -A 5 "gpt-4o-2024-08-06"
# Look for any model validation logic
ast-grep --pattern 'if $model == "gpt-4o-2024-08-06" { $$$ }'
Length of output: 2379
Script:
#!/bin/bash
# Check for model validation or configuration logic
rg -l "SUPPORTED_MODELS|ALLOWED_MODELS|MODEL_CONFIG" .
# Look for any model-related constants or configurations
ast-grep --pattern 'MODEL = $value'
ast-grep --pattern 'MODELS = $value'
# Check for any OpenAI client initialization or model validation
rg -A 5 "openai\..*Client"
Length of output: 162
Script:
#!/bin/bash
# Search for OpenAI imports and configurations
rg -l "from openai|import openai"
# Look for any model-related constants
rg "DEFAULT_MODEL|GPT_|MODEL_NAME"
# Check configuration files
fd -t f "config|settings" -e py -e yaml -e json | xargs cat
Length of output: 8456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
prediction_market_agent_tooling/deploy/agent.py
(3 hunks)
🔇 Additional comments (1)
prediction_market_agent_tooling/deploy/agent.py (1)
533-540
: LGTM!
The parameter rename is consistently applied in both the constructor declaration and the super() call.
if self.store_predictions: | ||
market.store_prediction( | ||
processed_market=processed_market, keys=keys, agent_name=self.agent_name | ||
) | ||
else: | ||
logger.info( | ||
f"Prediction {processed_market} not stored because {self.store_prediction=}." | ||
f"Prediction {processed_market} not stored because {self.store_predictions=}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent method name
While the parameter has been renamed to store_predictions
, the method call still uses the old name store_prediction
. This should be updated for consistency.
- market.store_prediction(
+ market.store_predictions(
processed_market=processed_market, keys=keys, agent_name=self.agent_name
)
Committable suggestion skipped: line range outside the PR's diff.
store_predictions: bool = True, | ||
) -> None: | ||
super().__init__(enable_langfuse=enable_langfuse) | ||
self.store_prediction = store_prediction | ||
self.store_predictions = store_predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Update required: Rename store_prediction
method across market classes
The parameter rename from store_prediction
to store_predictions
in DeployablePredictionAgent
is incomplete. Additional changes are needed:
- The
store_prediction
method needs to be renamed tostore_predictions
in:AgentMarket
base class (agent_market.py
)MetaculusAgentMarket
class (metaculus.py
)OmenAgentMarket
class (omen.py
)
- The script
scripts/store_prediction.py
should be renamed tostore_predictions.py
This ensures consistent naming throughout the codebase, as the functionality deals with storing multiple predictions.
🔗 Analysis chain
Verify parameter rename impact across the codebase
The rename from store_prediction
to store_predictions
improves naming consistency. However, based on past reviews, there are several places that still use store_prediction
(singular) that need to be updated:
- Method calls in agent.py
- Method names in market classes (agent_market.py, omen.py, metaculus.py)
- Script name and references in scripts/store_prediction.py
Let me check the specific occurrences of store_prediction
to verify if they are related to the parameter rename.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining instances of old parameter name
# and verify all constructor calls are updated
# Check for any remaining "store_prediction" instances
rg "store_prediction" -l
# Find all instantiations of DeployablePredictionAgent
ast-grep --pattern 'DeployablePredictionAgent($$$)'
Length of output: 326
Script:
#!/bin/bash
# Check the context of store_prediction usage in each file
echo "=== agent.py ==="
rg "store_prediction" prediction_market_agent_tooling/deploy/agent.py -B 2 -A 2
echo -e "\n=== agent_market.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/agent_market.py -B 2 -A 2
echo -e "\n=== metaculus.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/metaculus/metaculus.py -B 2 -A 2
echo -e "\n=== omen.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/omen/omen.py -B 2 -A 2
echo -e "\n=== store_prediction.py ==="
rg "store_prediction" scripts/store_prediction.py -B 2 -A 2
Length of output: 2355
No description provided.