-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into feat/model_evaluation
- Loading branch information
Showing
8 changed files
with
58 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
{"test_scenario": "Assistant smalltalk, salutation", "query": "Hi", "chat_history": "[]", "context": "['Hello! How can I assist you today?']"} | ||
{"test_scenario": "Text answer from memory", "query": "What is the total population of Mali", "chat_history": "[]", "context": "['The answer is: 17,907,114.0', '\u2705 A human approved this data recipe; Source; Raw data; 2018-01-01 to 2018-12-31']"} | ||
{"test_scenario": "Image answer from memory", "query": "plot a line chart of fatalities by month for Chad using HDX data as an image", "chat_history": "[]", "context": "['*AN IMAGE WAS OUTPUT, HERE IS ITS LLM-GENERATED DESCRIPTION* ... The image is a line graph titled \"Fatalities by Month for Chad,\" depicting the total number of fatalities per month from January 2008 to January 2024. The x-axis represents the months, while the y-axis represents the total fatalities. \\n\\nKey observations:\\n- There are several peaks indicating months with high fatalities.\\n- Notable spikes occur around mid-2008, early 2009, mid-2015, early 2021, and mid-2021.\\n- The highest peak appears to be in early 2021, with fatalities exceeding 500.\\n- There are periods of relatively low fatalities, particularly between 2010 and 2014.\\n- Overall, there is an increasing trend in fatalities over the years, with more frequent spikes in recent years.', '\u2705 A human approved this data recipe; Source; Raw data']"} | ||
{"test_scenario": "Image answer from recipe", "query": "Plot population pyramids for Nigeria", "chat_history": "[]", "context": "['*AN IMAGE WAS OUTPUT, HERE IS ITS LLM-GENERATED DESCRIPTION* ... The image is a population pyramid for Nigeria (NGA). It displays the population distribution by age and gender. The horizontal axis represents the population in millions, with males on the left (in blue) and females on the right (in pink). The vertical axis represents different age ranges, from 0-4 at the bottom to 80+ at the top.\\n\\nKey observations:\\n- The pyramid has a broad base, indicating a large number of young people (0-14 years).\\n- The population decreases steadily with age, showing a typical pyramid shape.\\n- There are more males than females in the younger age groups, but the difference is not very pronounced.\\n- The population significantly tapers off in the older age groups (60+ years).', '\u2705 A human approved this data recipe; Source; Raw data; 2020-01-01 to 2020-12-31']"} | ||
{"test_scenario": "Image answer from memory", "query": "plot a line chart of fatalities by month for Chad using HDX data as an image", "chat_history": "[]", "context": "['*AN IMAGE WAS OUTPUT, HERE IS ITS LLM-GENERATED DESCRIPTION* ... The image is a line graph titled \"Fatalities by Month for Chad,\" depicting the total number of fatalities per month from January 2008 to January 2024. The x-axis represents the months, while the y-axis represents the total fatalities.\\n\\nKey observations:\\n- There are several peaks indicating months with high fatalities.\\n- Notable spikes occur around mid-2008, early 2009, mid-2015, early 2021, and mid-2021.\\n- The highest peak appears to be in early 2021, with fatalities exceeding 500.\\n- There are periods of relatively low fatalities, particularly between 2010 and 2014.\\n- The overall trend shows fluctuations with occasional sharp increases in fatalities.', '\u2705 A human approved this data recipe; Source; Raw data']"} | ||
{"test_scenario": "Image answer from recipe", "query": "Plot population pyramids for Nigeria", "chat_history": "[]", "context": "['*AN IMAGE WAS OUTPUT, HERE IS ITS LLM-GENERATED DESCRIPTION* ... The image is a population pyramid for Nigeria (NGA). It displays the population distribution by age and gender. The horizontal axis represents the population in millions, with males on the left (in blue) and females on the right (in pink). The vertical axis represents different age ranges, starting from 0-4 at the bottom to 80+ at the top.\\n\\nKey observations:\\n- The pyramid has a broad base, indicating a large number of young people (0-14 age range).\\n- The population decreases steadily with age, showing a typical pyramid shape.\\n- There are more males than females in the younger age groups, but the difference is not very pronounced.\\n- The population in older age groups (60 and above) is significantly smaller compared to younger age groups.', '\u2705 A human approved this data recipe; Source; Raw data; 2020-01-01 to 2020-12-31']"} | ||
{"test_scenario": "Assistant on-the-fly SQL, text answer", "query": "How many rows does the population table have for Nigeria", "chat_history": "[]", "context": "['The population table has 43,794 rows for Nigeria.']"} | ||
{"test_scenario": "Assistant created image (simple)", "query": "Plot f{x}=10", "chat_history": "[]", "context": "['*AN IMAGE WAS OUTPUT, HERE IS ITS LLM-GENERATED DESCRIPTION* ... The image is a plot of the function \\\\( f(x) = 10 \\\\). The graph shows a horizontal line at \\\\( f(x) = 10 \\\\) across the range of \\\\( x \\\\) values from -10 to 10. The x-axis is labeled \"x\" and the y-axis is labeled \"f(x)\". There is a legend on the right side of the plot indicating that the orange line represents \\\\( f(x) = 10 \\\\). The plot title is \"Plot of f(x) = 10\".']"} | ||
{"test_scenario": "Assistant answers using vector store", "query": "Is your data updated in real time?", "chat_history": "[]", "context": "['The data is updated in real-time for data sources configured as API data sources. The system calls these APIs on-demand to pull in the latest data from the remote system. However, for data sources where data is ingested, such as HAPI, the update frequency depends on how often the ingestion process is run, which is controlled by the user of the humanitarian AI assistant[0].', '\ud83e\udd16 Caution: LLM Analysis; Sources: [0] HDIP FAQs (External) .pdf']"} | ||
{"test_scenario": "Assistant created image (simple)", "query": "Plot f{x}=10", "chat_history": "[]", "context": "['*AN IMAGE WAS OUTPUT, HERE IS ITS LLM-GENERATED DESCRIPTION* ... The image is a plot of the function \\\\( f(x) = 10 \\\\). The graph is a horizontal line at \\\\( y = 10 \\\\) that extends from \\\\( x = -10 \\\\) to \\\\( x = 10 \\\\). The x-axis ranges from -10 to 10, and the y-axis ranges from 0 to 10. The plot includes a legend indicating that the line represents \\\\( f(x) = 10 \\\\).']"} | ||
{"test_scenario": "Assistant answers using vector store", "query": "Is your data updated in real time?", "chat_history": "[]", "context": "['The data is not updated in real-time. For data sources configured as API data sources, the system will call them on-demand to pull in the latest data from the remote system. However, for data sources where data is ingested, like HAPI, the frequency of updates depends on how often the ingestion process is run, which is controlled by the user of the humanitarian AI assistant[0].', '\ud83e\udd16 Caution: LLM Analysis; Sources: [0] HDIP FAQs (External) .pdf']"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters