prompts.py

SA_SYSTEM_PROMPT = '''In this task, you will evaluate the quality of the visual storytelling generated by the response. The evaluation will focus on seven key criteria, with a special emphasis on coherence between text and visual elements. Note that the emotional aspect will not contribute to the score but can result in a 1-point deduction if the response contains harmful intentions or negative emotional impacts. 
Here are the detailed criteria for each aspect: 
1) Text Quality: The narrative should be clear, grammatically correct, and engaging. The text must flow logically, be free from errors, and draw the reader into the story. Dialogue and descriptions should feel natural and enhance the overall storytelling experience. 
2) Image Quality: Evaluate the vividness and clarity of the image descriptions. Although there are no real images, the descriptions should provide detailed and imaginative visuals, allowing the reader to clearly picture the scenes. The descriptions should be straightforward but sufficiently detailed to enrich the storytelling. 
3) Text-Image Coherence: The text and image descriptions should complement each other seamlessly. Every visual description should be tightly integrated with the narrative, adding depth and immersion to the story. The two elements should work together to provide a cohesive and engaging storytelling experience. 
4) Context Consistency: The story should maintain a consistent and logical flow throughout its setting, characters, and plot. There should be no contradictions, ensuring temporal and spatial continuity. The overall context should evolve smoothly, with clear transitions that support believability and coherence. 
5) Innovation: Evaluate the creativity of the narrative and its visual components. Look for fresh storytelling approaches, imaginative plot developments, and unique visual descriptions. The story should demonstrate originality, avoiding overused tropes and predictable arcs. 
6) Stylistic Consistency and Correspondence (strict): The text and image descriptions must maintain stylistic uniformity throughout the story. All image descriptions should be consistent in tone, style, and visual representation, aligning closely with the text. Even minor deviations in visual or textual style (e.g., tone shifts, compositional differences, or variations in description format) should lead to a lower score. Only provide stylistic examples when the model’s output achieves perfect alignment and consistency, and such examples should be rare.
7) Emotional Impact (penalty only): This criterion is used only to deduct points. If the response conveys harmful or negative emotions, deduct 1 point. Otherwise, no score change will be made.
I will provide a question and answer pair. You should begin your feedback with '### Feedback\n', followed by an analysis of the quality of the answer. After that, use the criteria above to give a final score, beginning with '### Score\n'. The final score should be an integer between 0 and 6.'''

PBL_SYSTEM_PROMPT = '''In this task, you will evaluate the quality of the methods generated by the response. Focus on each method’s clarity, practical utility, and effectiveness in achieving the intended goal. This evaluation is based on seven key criteria. Note: The emotional aspect will not contribute to the score, but any harmful intentions or negative impacts should result in a 1-point deduction.
Here are the detailed criteria of each aspect: 
1) Text Quality: The method should be written in clear, understandable language, free from grammatical errors, and structured logically, making it easy to follow and implement. 
2) Image Quality: The image descriptions should be vivid and relevant, enhancing the comprehension of the method by allowing the reader to visualize the steps. The text should still be self-sufficient without the images. 
3) Text-Image Coherence: The images and text should work together seamlessly, with each image description corresponding logically to specific steps in the text, enhancing the overall understanding of the method. 
4) Method Quality and Practical Utility: The method should be actionable, clearly explaining each step in sufficient detail to ensure the user can follow and apply it successfully in real-world scenarios. Depth and thoroughness of the explanation are more important than variety. 
5) Creativity and Engagement: The method should offer a unique or creative approach, making the instructional process engaging while ensuring the method remains clear and effective. 
6) Stylistic Consistency and Correspondence (strict): The images and text must be stylistically uniform throughout. All images must maintain consistency in their visual style (color scheme, composition, and artistic technique) relative to each other and the accompanying text. Even slight deviations in these elements (e.g., different tones, varying image formats) should result in a lower score. The text must reflect the same tone, formatting, and structure as the original content, without shifts in narrative style or atmosphere. Only provide stylistic examples when the model's output achieves perfect coherence and consistency, and such examples should be rare.
7) Emotional Impact (penalty only): This criterion will only be used to deduct points. If the response contains harmful, negative, or inappropriate emotions (such as violence or aggressive language), deduct 1 point. Otherwise, no score change will be made. 
I will give you the question and answer pair. You should give your feedback about the quality of the answer, beginning with '### Feedback\n'. After giving the feedback, use the above criteria to give the final score, beginning with '### Score\n'. The final score should be an integer between 0 to 6.'''

MSR_PROMPT = '''In this task, you will evaluate the quality of the model's response to a math question. The evaluation will focus on six key aspects: 
1) **Question Text Understanding**: Assess whether the model correctly understands and interprets the textual information given in the question, identifying key mathematical elements, relationships, or instructions from the text. 
2) **Question Image Understanding**: Evaluate the model’s understanding of the visual information (if applicable) in the question, including any diagrams, charts, or figures. The model should correctly interpret the visual elements and integrate them into the solution. 
3) **Reasoning Clarity**: The model should provide a clear, step-by-step explanation of its reasoning process, logically connecting the problem's details to the steps leading toward a solution. This should be easy to follow and free from unnecessary complexity. 
4) **Partial Correctness in Reasoning**: Even if the final answer is incorrect, evaluate whether the model shows correct intermediate steps, partial reasoning, or progress toward the right solution. This includes identifying whether the model has applied appropriate mathematical principles or formulas in parts of the response. 
5) **Final Answer Accuracy**: Determine whether the model arrives at the correct final answer, based on both the problem statement and the reasoning provided. An accurate answer, supported by correct reasoning, should receive the highest score.
6) **Excellency**: The answer is correct and all the intermediate steps are correct and easy to understand.
I will give you the question and answer pair. You should give your feedback about the quality of the answer, beginning with '### Feedback\n'. After giving the feedback, use the above criteria to give the final score, beginning with '### Score\n'. The final score should be an integer between 0 to 6.'''

def get_prompt(data_id):
    if "SA" in data_id:
        return SA_SYSTEM_PROMPT + "### Question\n{question}\n### Answer\n{answer}\n### Feedback\n"
    elif "PBL" in data_id:
        return PBL_SYSTEM_PROMPT + "### Question\n{question}\n### Answer\n{answer}\n### Feedback\n"
    elif 'MSR' in data_id:
        return MSR_PROMPT + "### Question\n{question}\n### Answer\n{answer}\n### Correct Answer\nThe correct answer to this question should be: {gt_answer}\n### Feedback\n"
    else:
        print("Unrecogonized data.")