Replies: 24 comments 33 replies
-
Ok! |
Beta Was this translation helpful? Give feedback.
-
Sure! |
Beta Was this translation helpful? Give feedback.
-
Two observations:
|
Beta Was this translation helpful? Give feedback.
-
Few pieces of intuition from #464:
|
Beta Was this translation helpful? Give feedback.
-
In task 204, based on the paper, genres refer to the source of collecting the premises. I don't think if the classification of the sentences based on the genres is practical. For example, two-sided, telephone conversations that took place in 1990 or 1991 (TELEPHONE) and Collection of two-sided, in-person conversations that took place in the early 2000s (FACE-TO-FACE) are hard to distinguish. |
Beta Was this translation helpful? Give feedback.
-
Structured Text Generation tasks based on the logic2text (e.g., |
Beta Was this translation helpful? Give feedback.
-
In the task |
Beta Was this translation helpful? Give feedback.
-
task383_matres_classification was difficult for annotators to understand. I didn't understand it either! |
Beta Was this translation helpful? Give feedback.
-
@yeganehkordi Task 456 was difficult. I think I didn't understand it either. |
Beta Was this translation helpful? Give feedback.
-
@pulkitverma25 wants to help in addressing crowdworker evaluation feedback. What task numbers he can work on, so that it does not conflict with the ones you are working on @Palipoor @yeganehkordi ? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
task667_mmmlu_answer_generation_business_ethics seems to be difficult for people. |
Beta Was this translation helpful? Give feedback.
-
Tasks 268, 274, and 276 were difficult for the annotators. I think the definitions are sufficient. |
Beta Was this translation helpful? Give feedback.
-
@Palipoor In task522, here is the question of one of the crowdworkers: |
Beta Was this translation helpful? Give feedback.
-
@aarunku5 @yeganehkordi @pulkitverma25 I will pick up tasks 900 - 1100. |
Beta Was this translation helpful? Give feedback.
-
task569_recipe_nlg_text_generation and task573_air_dialogue_classification were difficult for the annotators. Although, the definitions are good. |
Beta Was this translation helpful? Give feedback.
-
task1148_maximum_ascii_value |
Beta Was this translation helpful? Give feedback.
-
Tasks 664-667, 685-737: For now, I created PR#607 without addressing this. |
Beta Was this translation helpful? Give feedback.
-
People seem to be bothered by tasks that may need a google search. Like task1321_country_continent |
Beta Was this translation helpful? Give feedback.
-
I will take up changes in the tasks 1500-1600 |
Beta Was this translation helpful? Give feedback.
-
I will pick up tasks 1600 - 1700. |
Beta Was this translation helpful? Give feedback.
-
task1625_disfl_qa_asnwer_generation: |
Beta Was this translation helpful? Give feedback.
-
Picking 1701+ |
Beta Was this translation helpful? Give feedback.
-
task743_eurlex_summarization seems difficult for annotators (and me). I agree with the feedback. This is not something you can Google or answer using your knowledge or inference unless you have "legal" knowledge. |
Beta Was this translation helpful? Give feedback.
-
@Palipoor @yeganehkordi While you're addressing human eval feedback #276 let's use this thread to keep track of the tasks that are difficult for humans to understand (**) and we don't have a good way of improving them. We can collectively discuss ways to improve them or drop them. For completeness, here is the result of human evaluation with crowdworkers' feedback.
(**) not to be confused with tasks that are easy to understand but humans score low in terms of automatic eval.
Beta Was this translation helpful? Give feedback.
All reactions