We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I run the eval_n_turn.py to reproduce the single turn handicap sql results
python -m experiments.eval_n_turn \ --data_path ./data/sql/spider/ic_spider_dev.json \ --dialogue_limit 5 \ --env sql \ --image_name docker-env-sql \ --log_dir logs/experiments \ --max_turns 1 \ --policy chat \ --template game_sql \ --model gpt-3.5-turbo \ --handicap \ --verbose
i use this script to compute the success rate:
import json from re import T result_file_path = './logs/experiments/ic_sql_multiturn_gpt-3.5-turbo_1_turns.json' with open(result_file_path, 'r') as f: result = { key: {'success':0, 'total':0} for key in ['easy', 'medium', 'hard', 'extra','all'] } data = json.load(f) for index in data.keys(): if data[index]['summary']['max_reward'] == 1.0: result[data[index]['hardness']]['success']+=1 result['all']['success']+=1 result[data[index]['hardness']]['total']+=1 result['all']['total']+=1 for key in result.keys(): success = result[key]['success'] total = result[key]['total'] print(f"{key} Success rate: {success}/{total} ({success/total:.2%})")
get this result:
easy Success rate: 202/248 (81.45%) medium Success rate: 281/446 (63.00%) hard Success rate: 75/174 (43.10%) extra Success rate: 37/166 (22.29%) all Success rate: 595/1034 (57.54%)
It is lower than the result in paper. Did I do something wrong?
I also run the eval_n_turn.py to reproduce the single turn sql results.
python -m experiments.eval_n_turn \ --data_path ./data/sql/spider/ic_spider_dev.json \ --dialogue_limit 5 \ --env sql \ --image_name docker-env-sql \ --log_dir logs/experiments \ --max_turns 1 \ --policy chat \ --template game_sql \ --model gpt-3.5-turbo
Result is here:
easy Success rate: 41/248 (16.53%) medium Success rate: 28/446 (6.28%) hard Success rate: 3/174 (1.72%) extra Success rate: 2/166 (1.20%) all Success rate: 74/1034 (7.16%)
Did I do something wrong?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I run the eval_n_turn.py to reproduce the single turn handicap sql results
i use this script to compute the success rate:
get this result:
It is lower than the result in paper.
Did I do something wrong?
I also run the eval_n_turn.py to reproduce the single turn sql results.
Result is here:
Did I do something wrong?
The text was updated successfully, but these errors were encountered: