diff --git a/.gitignore b/.gitignore
index c825b24..d22d76c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -50,3 +50,4 @@ rsconnect/
docs/
python/mall/src/
+python/assets/style.css
diff --git a/_freeze/reference/MallFrame/execute-results/html.json b/_freeze/reference/MallFrame/execute-results/html.json
index f7bb025..8f027ec 100644
--- a/_freeze/reference/MallFrame/execute-results/html.json
+++ b/_freeze/reference/MallFrame/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "ab2b83a620205221658b2e724e51e73e",
+ "hash": "b719238e79aa68d0ccd5c863f83a82ef",
"result": {
"engine": "jupyter",
- "markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and\nset up the data frame that will be used in the\nexamples:\n\n\n::: {#e0baad23 .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100))\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n| [verify](#mall.MallFrame.verify) | Check to see if something is true about the text. |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#c433ce08 .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n
review | classify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "computer" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "computer" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "appliance" |
\n```\n:::\n:::\n\n\n::: {#cda91b85 .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n\n
review | prod_type |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "computer" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "computer" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "appliance" |
\n```\n:::\n:::\n\n\n::: {#f6d7e2c6 .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n
review | classify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "1" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "2" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "1" |
\n```\n:::\n:::\n\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `prompt` | str | The prompt to send to the LLM along with the `col` | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples\n\n::: {#2c633a89 .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n\n
review | custom |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "Yes" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "No" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "No" |
\n```\n:::\n:::\n\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#11a96b13 .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n\n
review | extract |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine" |
\n```\n:::\n:::\n\n\n::: {#33a564f6 .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n\n
review | prod |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine" |
\n```\n:::\n:::\n\n\n::: {#29bc70bf .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n
review | extract |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv | great" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop|frustration" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine | confusion" |
\n```\n:::\n:::\n\n\n::: {#35587a7e .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n\n
review | product | feelings |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv " | " great" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" | "frustration" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine " | " confusion" |
\n```\n:::\n:::\n\n\n::: {#bc1572b9 .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n\n
review | prod | feels |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv " | " great" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" | "frustration" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine " | " confusion" |
\n```\n:::\n:::\n\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `options` | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#16b56226 .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n\n
review | sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "positive" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "negative" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "neutral" |
\n```\n:::\n:::\n\n\n::: {#082d1ef7 .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n\n
review | review_sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "positive" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "negative" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "neutral" |
\n```\n:::\n:::\n\n\n::: {#0f2f7a13 .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n
review | sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "positive" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "negative" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "negative" |
\n```\n:::\n:::\n\n\n::: {#7bb697be .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : \"1\", \"negative\" : \"0\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n
review | sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "1" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "0" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "0" |
\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarize the text down to a specific number of words.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `max_words` | int | Maximum number of words to use for the summary | `10` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#2690ac20 .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n\n
review | summary |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "great tv with good features" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop purchase was a mistake" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "feeling uncertain about new purchase" |
\n```\n:::\n:::\n\n\n::: {#62f13bf2 .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n\n
review | review_summary |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "great tv with good features" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop purchase was a mistake" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "feeling uncertain about new purchase" |
\n```\n:::\n:::\n\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `language` | str | The target language to translate to. For example 'French'. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#a4d7ae95 .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n\n
review | translation |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido." |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa." |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en… |
\n```\n:::\n:::\n\n\n::: {#df4fb9ee .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n\n
review | translation |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent." |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit." |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre… |
\n```\n:::\n:::\n\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend` | str | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| `model` | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| `_cache` | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| `**kwargs` | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples\n\n::: {#8ac89991 .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.1}\n```\n:::\n:::\n\n\n::: {#ee435769 .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(temp = 0.3)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#266e2cb3 .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_my_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#ab07df94 .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n### verify { #mall.MallFrame.verify }\n\n`MallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')`\n\nCheck to see if something is true about the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `what` | str | The statement or question that needs to be verified against the provided text | `''` |\n| `yes_no` | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#3f2cbfdf .cell execution_count=23}\n``` {.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy\")\n```\n\n::: {.cell-output .cell-output-display execution_count=23}\n```{=html}\n\n
review | verify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | 1 |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | 0 |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | 0 |
\n```\n:::\n:::\n\n\n::: {#4899b7b6 .cell execution_count=24}\n``` {.python .cell-code}\n# Use 'yes_no' to modify the 'true' and 'false' values to return\nreviews.llm.verify(\"review\", \"is the customer happy\", [\"y\", \"n\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```{=html}\n\n
review | verify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "y" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "n" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "n" |
\n```\n:::\n:::\n\n\n",
+ "markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and\nset up the data frame that will be used in the\nexamples:\n\n\n::: {#e255dd1e .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100))\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n| [verify](#mall.MallFrame.verify) | Check to see if something is true about the text. |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#7d8996a1 .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n\n
review | classify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "computer" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "computer" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "appliance" |
\n```\n:::\n:::\n\n\n::: {#b192fa51 .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n\n
review | prod_type |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "computer" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "computer" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "appliance" |
\n```\n:::\n:::\n\n\n::: {#fd7ef1d2 .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n\n
review | classify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "1" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "2" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "1" |
\n```\n:::\n:::\n\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `prompt` | str | The prompt to send to the LLM along with the `col` | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples\n\n::: {#dd97345e .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n\n
review | custom |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "Yes" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "No" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "No" |
\n```\n:::\n:::\n\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#d0507daf .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n\n
review | extract |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine" |
\n```\n:::\n:::\n\n\n::: {#2d24b4f1 .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n\n
review | prod |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine" |
\n```\n:::\n:::\n\n\n::: {#45fd5ff8 .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n\n
review | extract |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv | great" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop|frustration" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine | confusion" |
\n```\n:::\n:::\n\n\n::: {#ad729125 .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n\n
review | product | feelings |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv " | " great" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" | "frustration" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine " | " confusion" |
\n```\n:::\n:::\n\n\n::: {#f510b410 .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n\n
review | prod | feels |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "tv " | " great" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop" | "frustration" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "washing machine " | " confusion" |
\n```\n:::\n:::\n\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `options` | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#e1a8bc00 .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n\n
review | sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "positive" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "negative" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "neutral" |
\n```\n:::\n:::\n\n\n::: {#bf76c32c .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n\n
review | review_sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "positive" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "negative" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "neutral" |
\n```\n:::\n:::\n\n\n::: {#817144f6 .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n\n
review | sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "positive" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "negative" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "negative" |
\n```\n:::\n:::\n\n\n::: {#139c8892 .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : 1, \"negative\" : 0})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n\n
review | sentiment |
---|
"This has been the best TV I've ever used. Great screen, and sound." | 1 |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | 0 |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | 0 |
\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarize the text down to a specific number of words.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `max_words` | int | Maximum number of words to use for the summary | `10` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#96f10751 .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n\n
review | summary |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "great tv with good features" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop purchase was a mistake" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "feeling uncertain about new purchase" |
\n```\n:::\n:::\n\n\n::: {#1e180aa2 .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n\n
review | review_summary |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "great tv with good features" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "laptop purchase was a mistake" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "feeling uncertain about new purchase" |
\n```\n:::\n:::\n\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `language` | str | The target language to translate to. For example 'French'. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#67b462bb .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n\n
review | translation |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido." |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa." |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en… |
\n```\n:::\n:::\n\n\n::: {#f0529322 .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n\n
review | translation |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent." |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit." |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre… |
\n```\n:::\n:::\n\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend` | str | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| `model` | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| `_cache` | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| `**kwargs` | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples\n\n::: {#f669b934 .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.1}\n```\n:::\n:::\n\n\n::: {#6a6296ad .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(temp = 0.3)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#725f81c4 .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_my_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#c14035f2 .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n### verify { #mall.MallFrame.verify }\n\n`MallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')`\n\nCheck to see if something is true about the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `what` | str | The statement or question that needs to be verified against the provided text | `''` |\n| `yes_no` | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#51fbc07d .cell execution_count=23}\n``` {.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy\")\n```\n\n::: {.cell-output .cell-output-display execution_count=23}\n```{=html}\n\n
review | verify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | 1 |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | 0 |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | 0 |
\n```\n:::\n:::\n\n\n::: {#dbe707bf .cell execution_count=24}\n``` {.python .cell-code}\n# Use 'yes_no' to modify the 'true' and 'false' values to return\nreviews.llm.verify(\"review\", \"is the customer happy\", [\"y\", \"n\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```{=html}\n\n
review | verify |
---|
"This has been the best TV I've ever used. Great screen, and sound." | "y" |
"I regret buying this laptop. It is too slow and the keyboard is too noisy" | "n" |
"Not sure how to feel about my new washing machine. Great color, but hard to figure" | "n" |
\n```\n:::\n:::\n\n\n",
"supporting": [
"MallFrame_files"
],
diff --git a/python/.coverage b/python/.coverage
new file mode 100644
index 0000000..5d98eab
Binary files /dev/null and b/python/.coverage differ
diff --git a/python/mall/llm.py b/python/mall/llm.py
index 4ed8929..2191ac8 100644
--- a/python/mall/llm.py
+++ b/python/mall/llm.py
@@ -41,8 +41,11 @@ def map_call(df, col, msg, pred_name, use, valid_resps="", convert=None):
def llm_call(x, msg, use, preview=False, valid_resps="", convert=None, data_type=None):
+ backend = use.get("backend")
+ model=use.get("model")
call = dict(
- model=use.get("model"),
+ backend=backend,
+ model=model,
messages=build_msg(x, msg),
options=use.get("options"),
)
@@ -52,16 +55,24 @@ def llm_call(x, msg, use, preview=False, valid_resps="", convert=None, data_type
cache = ""
if use.get("_cache") != "":
+
hash_call = build_hash(call)
cache = cache_check(hash_call, use)
if cache == "":
- resp = ollama.chat(
- model=use.get("model"),
- messages=build_msg(x, msg),
- options=use.get("options"),
- )
- out = resp["message"]["content"]
+ if backend == "ollama":
+ resp = ollama.chat(
+ model=use.get("model"),
+ messages=build_msg(x, msg),
+ options=use.get("options"),
+ )
+ out = resp["message"]["content"]
+ if backend == "test":
+ if model=="echo":
+ out = x
+ if model=="content":
+ out = msg[0]["content"]
+ return(out)
else:
out = cache
@@ -74,10 +85,11 @@ def llm_call(x, msg, use, preview=False, valid_resps="", convert=None, data_type
if out == label:
out = convert.get(label)
- # out = data_type(out)
+ if data_type == int:
+ out = data_type(out)
- # if out not in valid_resps:
- # out = None
+ if out not in valid_resps and len(valid_resps) > 0:
+ out = None
return out
diff --git a/python/mall/polars.py b/python/mall/polars.py
index 4a7cfd9..d31c849 100644
--- a/python/mall/polars.py
+++ b/python/mall/polars.py
@@ -137,7 +137,7 @@ def sentiment(
```{python}
# Use a DICT object to specify values to return per sentiment
- reviews.llm.sentiment("review", {"positive" : "1", "negative" : "0"})
+ reviews.llm.sentiment("review", {"positive" : 1, "negative" : 0})
```
"""
diff --git a/python/tests/__init__.py b/python/tests/__init__.py
new file mode 100644
index 0000000..570e4df
--- /dev/null
+++ b/python/tests/__init__.py
@@ -0,0 +1 @@
+"Unit tests for mall"
\ No newline at end of file
diff --git a/python/tests/test_classify.py b/python/tests/test_classify.py
new file mode 100644
index 0000000..01a41be
--- /dev/null
+++ b/python/tests/test_classify.py
@@ -0,0 +1,29 @@
+import pytest
+import mall
+import polars as pl
+import pyarrow
+import shutil
+import os
+
+if os._exists("_test_cache"):
+ shutil.rmtree("_test_cache", ignore_errors=True)
+
+
+def test_classify():
+ df = pl.DataFrame(dict(x=["one", "two", "three"]))
+ df.llm.use("test", "echo", _cache="_test_cache")
+ x = df.llm.classify("x", ["one", "two"])
+ assert (
+ x.select("classify").to_pandas().to_string()
+ == " classify\n0 one\n1 two\n2 None"
+ )
+
+
+def test_classify_dict():
+ df = pl.DataFrame(dict(x=[1, 2, 3]))
+ df.llm.use("test", "echo", _cache="_test_cache")
+ x = df.llm.classify("x", {"one": 1, "two": 2})
+ assert (
+ x.select("classify").to_pandas().to_string()
+ == " classify\n0 1.0\n1 2.0\n2 NaN"
+ )
diff --git a/python/tests/test_extract.py b/python/tests/test_extract.py
new file mode 100644
index 0000000..a320896
--- /dev/null
+++ b/python/tests/test_extract.py
@@ -0,0 +1,38 @@
+import pytest
+import mall
+import polars as pl
+import pyarrow
+
+import shutil
+import os
+if os._exists("_test_cache"):
+ shutil.rmtree("_test_cache", ignore_errors=True)
+
+def test_extract_list():
+ df = pl.DataFrame(dict(x="x"))
+ df.llm.use("test", "content", _cache = "_test_cache")
+ x = df.llm.extract("x", ["a", "b"])
+ assert (
+ x["extract"][0]
+ == "You are a helpful text extraction engine. Extract the a, b being referred to on the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers. The answer is based on the following text:\n{}"
+ )
+
+
+def test_extract_dict():
+ df = pl.DataFrame(dict(x="x"))
+ df.llm.use("test", "content", _cache = "_test_cache")
+ x = df.llm.extract("x", dict(a="one", b="two"))
+ assert (
+ x["extract"][0]
+ == "You are a helpful text extraction engine. Extract the one, two being referred to on the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers. The answer is based on the following text:\n{}"
+ )
+
+
+def test_extract_one():
+ df = pl.DataFrame(dict(x="x"))
+ df.llm.use("test", "content", _cache = "_test_cache")
+ x = df.llm.extract("x", labels="a")
+ assert (
+ x["extract"][0]
+ == "You are a helpful text extraction engine. Extract the a being referred to on the text. I expect 1 item exactly. No capitalization. No explanations. The answer is based on the following text:\n{}"
+ )
diff --git a/python/tests/test_sentiment.py b/python/tests/test_sentiment.py
new file mode 100644
index 0000000..2e3f711
--- /dev/null
+++ b/python/tests/test_sentiment.py
@@ -0,0 +1,55 @@
+import pytest
+import mall
+import polars as pl
+import pyarrow
+
+import shutil
+import os
+
+if os._exists("_test_cache"):
+ shutil.rmtree("_test_cache", ignore_errors=True)
+
+
+def test_sentiment_simple():
+ data = mall.MallData
+ reviews = data.reviews
+ reviews.llm.use("test", "echo", _cache="_test_cache")
+ x = reviews.llm.sentiment("review")
+ assert (
+ x.select("sentiment").to_pandas().to_string()
+ == " sentiment\n0 None\n1 None\n2 None"
+ )
+
+
+def sim_sentiment():
+ df = pl.DataFrame(dict(x=["positive", "negative", "neutral", "not-real"]))
+ df.llm.use("test", "echo", _cache="_test_cache")
+ return df
+
+
+def test_sentiment_valid():
+ x = sim_sentiment()
+ x = x.llm.sentiment("x")
+ assert (
+ x.select("sentiment").to_pandas().to_string()
+ == " sentiment\n0 positive\n1 negative\n2 neutral\n3 None"
+ )
+
+
+def test_sentiment_valid2():
+ x = sim_sentiment()
+ x = x.llm.sentiment("x", ["positive", "negative"])
+ assert (
+ x.select("sentiment").to_pandas().to_string()
+ == " sentiment\n0 positive\n1 negative\n2 None\n3 None"
+ )
+
+
+def test_sentiment_prompt():
+ df = pl.DataFrame(dict(x="x"))
+ df.llm.use("test", "content", _cache="_test_cache")
+ x = df.llm.sentiment("x")
+ assert (
+ x["sentiment"][0]
+ == "You are a helpful sentiment engine. Return only one of the following answers: positive, negative, neutral . No capitalization. No explanations. The answer is based on the following text:\n{}"
+ )
diff --git a/python/tests/test_summarize.py b/python/tests/test_summarize.py
new file mode 100644
index 0000000..e2182d4
--- /dev/null
+++ b/python/tests/test_summarize.py
@@ -0,0 +1,29 @@
+import pytest
+import mall
+import polars as pl
+import pyarrow
+import shutil
+import os
+
+if os._exists("_test_cache"):
+ shutil.rmtree("_test_cache", ignore_errors=True)
+
+
+def test_summarize_prompt():
+ df = pl.DataFrame(dict(x="x"))
+ df.llm.use("test", "content", _cache="_test_cache")
+ x = df.llm.summarize("x")
+ assert (
+ x["summary"][0]
+ == "You are a helpful summarization engine. Your answer will contain no no capitalization and no explanations. Return no more than 10 words. The answer is the summary of the following text:\n{}"
+ )
+
+
+def test_summarize_max():
+ df = pl.DataFrame(dict(x="x"))
+ df.llm.use("test", "content", _cache="_test_cache")
+ x = df.llm.summarize("x", max_words=5)
+ assert (
+ x["summary"][0]
+ == "You are a helpful summarization engine. Your answer will contain no no capitalization and no explanations. Return no more than 5 words. The answer is the summary of the following text:\n{}"
+ )
diff --git a/python/tests/test_translate.py b/python/tests/test_translate.py
new file mode 100644
index 0000000..5118d88
--- /dev/null
+++ b/python/tests/test_translate.py
@@ -0,0 +1,20 @@
+import pytest
+import mall
+import polars as pl
+import pyarrow
+
+import shutil
+import os
+
+if os._exists("_test_cache"):
+ shutil.rmtree("_test_cache", ignore_errors=True)
+
+
+def test_translate_prompt():
+ df = pl.DataFrame(dict(x="x"))
+ df.llm.use("test", "content", _cache="_test_cache")
+ x = df.llm.translate("x", language="spanish")
+ assert (
+ x["translation"][0]
+ == "You are a helpful translation engine. You will return only the translation text, no explanations. The target language to translate to is: spanish. The answer is the translation of the following text:\n{}"
+ )
diff --git a/python/tests/test_use.py b/python/tests/test_use.py
new file mode 100644
index 0000000..90795c1
--- /dev/null
+++ b/python/tests/test_use.py
@@ -0,0 +1,28 @@
+import pytest
+import mall
+import polars
+
+
+def test_use_init():
+ data = mall.MallData
+ reviews = data.reviews
+ x = reviews.llm.use()
+ x == dict(backend="ollama", model="llama3.2", _cache="_mall_cache")
+
+
+def test_use_mod1():
+ data = mall.MallData
+ reviews = data.reviews
+ x = reviews.llm.use(options=dict(seed=100))
+ x == dict(
+ backend="ollama", model="llama3.2", _cache="_mall_cache", options=dict(seed=100)
+ )
+
+
+def test_use_mod2():
+ data = mall.MallData
+ reviews = data.reviews
+ x = reviews.llm.use(options=dict(seed=99))
+ x == dict(
+ backend="ollama", model="llama3.2", _cache="_mall_cache", options=dict(seed=99)
+ )
diff --git a/python/tests/test_verify.py b/python/tests/test_verify.py
new file mode 100644
index 0000000..58421e7
--- /dev/null
+++ b/python/tests/test_verify.py
@@ -0,0 +1,29 @@
+import pytest
+import mall
+import polars as pl
+import pyarrow
+import shutil
+import os
+
+if os._exists("_test_cache"):
+ shutil.rmtree("_test_cache", ignore_errors=True)
+
+
+def test_verify():
+ df = pl.DataFrame(dict(x=[1, 1, 0, 2]))
+ df.llm.use("test", "echo", _cache="_test_cache")
+ x = df.llm.verify("x", "this is my question")
+ assert (
+ x.select("verify").to_pandas().to_string()
+ == " verify\n0 1.0\n1 1.0\n2 0.0\n3 NaN"
+ )
+
+
+def test_verify_yn():
+ df = pl.DataFrame(dict(x=["y", "n", "y", "x"]))
+ df.llm.use("test", "echo", _cache="_test_cache")
+ x = df.llm.verify("x", "this is my question", ["y", "n"])
+ assert (
+ x.select("verify").to_pandas().to_string()
+ == " verify\n0 y\n1 n\n2 y\n3 None"
+ )
diff --git a/reference/MallFrame.qmd b/reference/MallFrame.qmd
index e11b8b3..2da1411 100644
--- a/reference/MallFrame.qmd
+++ b/reference/MallFrame.qmd
@@ -177,7 +177,7 @@ reviews.llm.sentiment("review", ["positive", "negative"])
```{python}
# Use a DICT object to specify values to return per sentiment
-reviews.llm.sentiment("review", {"positive" : "1", "negative" : "0"})
+reviews.llm.sentiment("review", {"positive" : 1, "negative" : 0})
```
### summarize { #mall.MallFrame.summarize }