diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json index fa563c6..4271fa3 100644 --- a/_freeze/index/execute-results/html.json +++ b/_freeze/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "47342fb39f04134910196a040cf3c8a1", + "hash": "c2e945cd291c71bc149c2accd9f776b2", "result": { "engine": "knitr", - "markdown": "---\nformat:\n html:\n toc: true\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[![R package check](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml)\n[![R package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)\n[![Lifecycle:\nexperimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n\n\n\nRun multiple LLM predictions against a data frame. The predictions are processed \nrow-wise over a specified column. It works using a pre-determined one-shot prompt,\nalong with the current row's content. `mall` has been implemented for both R\nand Python. The prompt that is use will depend of the type of analysis needed. \n\nCurrently, the included prompts perform the following: \n\n- [Sentiment analysis](#sentiment)\n- [Text summarizing](#summarize)\n- [Classify text](#classify)\n- [Extract one, or several](#extract), specific pieces information from the text\n- [Translate text](#translate)\n- [Custom prompt](#custom-prompt)\n\nThis package is inspired by the SQL AI functions now offered by vendors such as\n[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html) \nand Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact with LLMs \ninstalled locally. \n\n\n\nFor **R**, that interaction takes place via the \n[`ollamar`](https://hauselin.github.io/ollama-r/) package. The functions are \ndesigned to easily work with piped commands, such as `dplyr`. \n\n```r\nreviews |>\n llm_sentiment(review)\n```\n\n\n\nFor **Python**, `mall` is a library extension to [Polars](https://pola.rs/). To\ninteract with Ollama, it uses the official\n[Python library](https://github.com/ollama/ollama-python).\n\n```python\nreviews.llm.sentiment(\"review\")\n```\n\n## Motivation\n\nWe want to new find ways to help data scientists use LLMs in their daily work. \nUnlike the familiar interfaces, such as chatting and code completion, this interface\nruns your text data directly against the LLM. \n\nThe LLM's flexibility, allows for it to adapt to the subject of your data, and \nprovide surprisingly accurate predictions. This saves the data scientist the\nneed to write and tune an NLP model. \n\nIn recent times, the capabilities of LLMs that can run locally in your computer \nhave increased dramatically. This means that these sort of analysis can run\nin your machine with good accuracy. Additionally, it makes it possible to take\nadvantage of LLM's at your institution, since the data will not leave the\ncorporate network. \n\n## Get started\n\n- Install `mall` from Github\n\n \n::: {.panel-tabset group=\"language\"}\n## R\n```r\npak::pak(\"mlverse/mall/r\")\n```\n\n## Python\n```python\npip install \"mall @ git+https://git@github.com/mlverse/mall.git#subdirectory=python\"\n```\n:::\n\n- [Download Ollama from the official website](https://ollama.com/download)\n\n- Install and start Ollama in your computer\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n- Install Ollama in your machine. The `ollamar` package's website provides this\n[Installation guide](https://hauselin.github.io/ollama-r/#installation)\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```r\n ollamar::pull(\"llama3.2\")\n ```\n \n## Python\n\n- Install the official Ollama library\n ```python\n pip install ollama\n ```\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```python\n import ollama\n ollama.pull('llama3.2')\n ```\n:::\n \n#### With Databricks (R only)\n\nIf you pass a table connected to **Databricks** via `odbc`, `mall` will \nautomatically use Databricks' LLM instead of Ollama. *You won't need Ollama \ninstalled if you are using Databricks only.*\n\n`mall` will call the appropriate SQL AI function. For more information see our \n[Databricks article.](https://mlverse.github.io/mall/articles/databricks.html) \n\n## LLM functions\n\nWe will start with loading a very small data set contained in `mall`. It has\n3 product reviews that we will use as the source of our examples.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\ndata(\"reviews\")\n\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n\n## Python\n\n\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall \ndata = mall.MallData\nreviews = data.reviews\n\nreviews \n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
review
"This has been the best TV I've ever used. Great screen, and sound."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"
\n```\n\n:::\n:::\n\n\n:::\n\n\n\n\n\n\n\n### Sentiment\n\nAutomatically returns \"positive\", \"negative\", or \"neutral\" based on the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_sentiment(review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_sentiment.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.sentiment) \n\n:::\n\n### Summarize\n\nThere may be a need to reduce the number of words in a given text. Typically to \nmake it easier to understand its intent. The function has an argument to \ncontrol the maximum number of words to output \n(`max_words`):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_summarize(review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_summarize.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.summarize(\"review\", 5)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.summarize) \n\n:::\n\n### Classify\n\nUse the LLM to categorize the text into one of the options you provide: \n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_classify(review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_classify.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.classify(\"review\", [\"computer\", \"appliance\"])\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify) \n\n:::\n\n### Extract \n\nOne of the most interesting use cases Using natural language, we can tell the \nLLM to return a specific part of the text. In the following example, we request\nthat the LLM return the product being referred to. We do this by simply saying \n\"product\". The LLM understands what we *mean* by that word, and looks for that\nin the text.\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_extract(review, \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_extract.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.extract(\"review\", \"product\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.extract) \n\n:::\n\n\n### Translate\n\nAs the title implies, this function will translate the text into a specified \nlanguage. What is really nice, it is that you don't need to specify the language\nof the source text. Only the target language needs to be defined. The translation\naccuracy will depend on the LLM\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_translate(review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Me arrepiento de comprar este p…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo me sien…\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_translate.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.translate) \n\n:::\n\n### Custom prompt\n\nIt is possible to pass your own prompt to the LLM, and have `mall` run it \nagainst each text entry:\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_custom.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.custom) \n\n:::\n\n## Model selection and settings\n\nYou can set the model and its options to use when calling the LLM. In this case,\nwe refer to options as model specific things that can be set, such as seed or\ntemperature. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\nInvoking an `llm` function will automatically initialize a model selection\nif you don't have one selected yet. If there is only one option, it will \npre-select it for you. If there are more than one available models, then `mall`\nwill present you as menu selection so you can select which model you wish to \nuse.\n\nCalling `llm_use()` directly will let you specify the model and backend to use.\nYou can also setup additional arguments that will be passed down to the \nfunction that actually runs the prediction. In the case of Ollama, that function\nis [`chat()`](https://hauselin.github.io/ollama-r/reference/chat.html). \n\nThe model to use, and other options can be set for the current R session\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0)\n```\n:::\n\n\n\n\n## Python \n\nThe model and options to be used will be defined at the Polars data frame \nobject level. If not passed, the default model will be **llama3.2**.\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100))\n```\n:::\n\n\n\n:::\n\n#### Results caching \n\nBy default `mall` caches the requests and corresponding results from a given\nLLM run. Each response is saved as individual JSON files. By default, the folder\nname is `_mall_cache`. The folder name can be customized, if needed. Also, the\ncaching can be turned off by setting the argument to empty (`\"\"`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"_my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"\")\n```\n:::\n\n\n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"\")\n```\n:::\n\n\n\n:::\n\nFor more information see the [Caching Results](articles/caching.qmd) article. \n\n## Key considerations\n\nThe main consideration is **cost**. Either, time cost, or money cost.\n\nIf using this method with an LLM locally available, the cost will be a long \nrunning time. Unless using a very specialized LLM, a given LLM is a general model. \nIt was fitted using a vast amount of data. So determining a response for each \nrow, takes longer than if using a manually created NLP model. The default model\nused in Ollama is [Llama 3.2](https://ollama.com/library/llama3.2), \nwhich was fitted using 3B parameters. \n\nIf using an external LLM service, the consideration will need to be for the \nbilling costs of using such service. Keep in mind that you will be sending a lot\nof data to be evaluated. \n\nAnother consideration is the novelty of this approach. Early tests are \nproviding encouraging results. But you, as an user, will still need to keep\nin mind that the predictions will not be infallible, so always check the output.\nAt this time, I think the best use for this method, is for a quick analysis.\n\n\n## Vector functions (R only)\n\n`mall` includes functions that expect a vector, instead of a table, to run the\npredictions. This should make it easier to test things, such as custom prompts\nor results of specific text. Each `llm_` function has a corresponding `llm_vec_`\nfunction:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_sentiment(\"I am happy\")\n#> [1] \"positive\"\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_translate(\"Este es el mejor dia!\", \"english\")\n#> [1] \"It's the best day!\"\n```\n:::\n", + "markdown": "---\nformat:\n html:\n toc: true\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n\n[![R package check](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml)\n[![R package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)\n[![Lifecycle:\nexperimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n\n\n\nRun multiple LLM predictions against a data frame. The predictions are processed \nrow-wise over a specified column. It works using a pre-determined one-shot prompt,\nalong with the current row's content. `mall` has been implemented for both R\nand Python. The prompt that is use will depend of the type of analysis needed. \n\nCurrently, the included prompts perform the following: \n\n- [Sentiment analysis](#sentiment)\n- [Text summarizing](#summarize)\n- [Classify text](#classify)\n- [Extract one, or several](#extract), specific pieces information from the text\n- [Translate text](#translate)\n- [Verify that something it true](#verify) about the text (binary)\n- [Custom prompt](#custom-prompt)\n\nThis package is inspired by the SQL AI functions now offered by vendors such as\n[Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html) \nand Snowflake. `mall` uses [Ollama](https://ollama.com/) to interact with LLMs \ninstalled locally. \n\n\n\nFor **R**, that interaction takes place via the \n[`ollamar`](https://hauselin.github.io/ollama-r/) package. The functions are \ndesigned to easily work with piped commands, such as `dplyr`. \n\n```r\nreviews |>\n llm_sentiment(review)\n```\n\n\n\nFor **Python**, `mall` is a library extension to [Polars](https://pola.rs/). To\ninteract with Ollama, it uses the official\n[Python library](https://github.com/ollama/ollama-python).\n\n```python\nreviews.llm.sentiment(\"review\")\n```\n\n## Motivation\n\nWe want to new find ways to help data scientists use LLMs in their daily work. \nUnlike the familiar interfaces, such as chatting and code completion, this interface\nruns your text data directly against the LLM. \n\nThe LLM's flexibility, allows for it to adapt to the subject of your data, and \nprovide surprisingly accurate predictions. This saves the data scientist the\nneed to write and tune an NLP model. \n\nIn recent times, the capabilities of LLMs that can run locally in your computer \nhave increased dramatically. This means that these sort of analysis can run\nin your machine with good accuracy. Additionally, it makes it possible to take\nadvantage of LLM's at your institution, since the data will not leave the\ncorporate network. \n\n## Get started\n\n- Install `mall` from Github\n\n \n::: {.panel-tabset group=\"language\"}\n## R\n```r\npak::pak(\"mlverse/mall/r\")\n```\n\n## Python\n```python\npip install \"mall @ git+https://git@github.com/mlverse/mall.git#subdirectory=python\"\n```\n:::\n\n- [Download Ollama from the official website](https://ollama.com/download)\n\n- Install and start Ollama in your computer\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n- Install Ollama in your machine. The `ollamar` package's website provides this\n[Installation guide](https://hauselin.github.io/ollama-r/#installation)\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```r\n ollamar::pull(\"llama3.2\")\n ```\n \n## Python\n\n- Install the official Ollama library\n ```python\n pip install ollama\n ```\n\n- Download an LLM model. For example, I have been developing this package using\nLlama 3.2 to test. To get that model you can run: \n ```python\n import ollama\n ollama.pull('llama3.2')\n ```\n:::\n \n#### With Databricks (R only)\n\nIf you pass a table connected to **Databricks** via `odbc`, `mall` will \nautomatically use Databricks' LLM instead of Ollama. *You won't need Ollama \ninstalled if you are using Databricks only.*\n\n`mall` will call the appropriate SQL AI function. For more information see our \n[Databricks article.](https://mlverse.github.io/mall/articles/databricks.html) \n\n## LLM functions\n\nWe will start with loading a very small data set contained in `mall`. It has\n3 product reviews that we will use as the source of our examples.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\ndata(\"reviews\")\n\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n\n## Python\n\n\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall \ndata = mall.MallData\nreviews = data.reviews\n\nreviews \n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
review
"This has been the best TV I've ever used. Great screen, and sound."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"
\n```\n\n:::\n:::\n\n\n:::\n\n\n\n\n\n\n\n### Sentiment\n\nAutomatically returns \"positive\", \"negative\", or \"neutral\" based on the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_sentiment(review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_sentiment.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.sentiment) \n\n:::\n\n### Summarize\n\nThere may be a need to reduce the number of words in a given text. Typically to \nmake it easier to understand its intent. The function has an argument to \ncontrol the maximum number of words to output \n(`max_words`):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_summarize(review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… it's a great tv \n#> 2 I regret buying this laptop. It is too slow … laptop purchase was a mistake \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_summarize.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.summarize(\"review\", 5)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.summarize) \n\n:::\n\n### Classify\n\nUse the LLM to categorize the text into one of the options you provide: \n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_classify(review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_classify.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.classify(\"review\", [\"computer\", \"appliance\"])\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify) \n\n:::\n\n### Extract \n\nOne of the most interesting use cases Using natural language, we can tell the \nLLM to return a specific part of the text. In the following example, we request\nthat the LLM return the product being referred to. We do this by simply saying \n\"product\". The LLM understands what we *mean* by that word, and looks for that\nin the text.\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_extract(review, \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_extract.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.extract(\"review\", \"product\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.extract) \n\n:::\n\n### Classify\n\nUse the LLM to categorize the text into one of the options you provide: \n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_classify(review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_classify.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.classify(\"review\", [\"computer\", \"appliance\"])\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify) \n\n:::\n\n### Verify \n\nThis functions allows you to check and see if a statement is true, based\non the provided text. By default, it will return a 1 for \"yes\", and 0 for\n\"no\". This can be customized.\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_verify(review, \"is the customer happy with the purchase\")\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1 \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 0 \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 0\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_verify.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy with the purchase\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound."1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"0
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.verify) \n\n:::\n\n\n\n### Translate\n\nAs the title implies, this function will translate the text into a specified \nlanguage. What is really nice, it is that you don't need to specify the language\nof the source text. Only the target language needs to be defined. The translation\naccuracy will depend on the LLM\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_translate(review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Me arrepiento de comprar este p…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo me sien…\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_translate.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.translate) \n\n:::\n\n### Custom prompt\n\nIt is possible to pass your own prompt to the LLM, and have `mall` run it \nagainst each text entry:\n\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n\n\n\nFor more information and examples visit this function's \n[R reference page](reference/llm_custom.qmd) \n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n\n:::\n:::\n\n\n\nFor more information and examples visit this function's \n[Python reference page](reference/MallFrame.qmd#mall.MallFrame.custom) \n\n:::\n\n## Model selection and settings\n\nYou can set the model and its options to use when calling the LLM. In this case,\nwe refer to options as model specific things that can be set, such as seed or\ntemperature. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\nInvoking an `llm` function will automatically initialize a model selection\nif you don't have one selected yet. If there is only one option, it will \npre-select it for you. If there are more than one available models, then `mall`\nwill present you as menu selection so you can select which model you wish to \nuse.\n\nCalling `llm_use()` directly will let you specify the model and backend to use.\nYou can also setup additional arguments that will be passed down to the \nfunction that actually runs the prediction. In the case of Ollama, that function\nis [`chat()`](https://hauselin.github.io/ollama-r/reference/chat.html). \n\nThe model to use, and other options can be set for the current R session\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0)\n```\n:::\n\n\n\n\n## Python \n\nThe model and options to be used will be defined at the Polars data frame \nobject level. If not passed, the default model will be **llama3.2**.\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100))\n```\n:::\n\n\n\n:::\n\n#### Results caching \n\nBy default `mall` caches the requests and corresponding results from a given\nLLM run. Each response is saved as individual JSON files. By default, the folder\nname is `_mall_cache`. The folder name can be customized, if needed. Also, the\ncaching can be turned off by setting the argument to empty (`\"\"`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"_my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"\")\n```\n:::\n\n\n\n## Python \n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"my_cache\")\n```\n:::\n\n\n\nTo turn off:\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"\")\n```\n:::\n\n\n\n:::\n\nFor more information see the [Caching Results](articles/caching.qmd) article. \n\n## Key considerations\n\nThe main consideration is **cost**. Either, time cost, or money cost.\n\nIf using this method with an LLM locally available, the cost will be a long \nrunning time. Unless using a very specialized LLM, a given LLM is a general model. \nIt was fitted using a vast amount of data. So determining a response for each \nrow, takes longer than if using a manually created NLP model. The default model\nused in Ollama is [Llama 3.2](https://ollama.com/library/llama3.2), \nwhich was fitted using 3B parameters. \n\nIf using an external LLM service, the consideration will need to be for the \nbilling costs of using such service. Keep in mind that you will be sending a lot\nof data to be evaluated. \n\nAnother consideration is the novelty of this approach. Early tests are \nproviding encouraging results. But you, as an user, will still need to keep\nin mind that the predictions will not be infallible, so always check the output.\nAt this time, I think the best use for this method, is for a quick analysis.\n\n\n## Vector functions (R only)\n\n`mall` includes functions that expect a vector, instead of a table, to run the\npredictions. This should make it easier to test things, such as custom prompts\nor results of specific text. Each `llm_` function has a corresponding `llm_vec_`\nfunction:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_sentiment(\"I am happy\")\n#> [1] \"positive\"\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_translate(\"Este es el mejor dia!\", \"english\")\n#> [1] \"It's the best day!\"\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/MallFrame/execute-results/html.json b/_freeze/reference/MallFrame/execute-results/html.json index d03eb13..f7bb025 100644 --- a/_freeze/reference/MallFrame/execute-results/html.json +++ b/_freeze/reference/MallFrame/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "4593a510e46f23a242198ebad071e1ae", + "hash": "ab2b83a620205221658b2e724e51e73e", "result": { "engine": "jupyter", - "markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and \nset up the data frame that will be used in the \nexamples:\n\n\n::: {#7da4c2de .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100))\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#50356a12 .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#929299a7 .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n
reviewprod_type
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#715eb674 .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""2"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""1"
\n```\n:::\n:::\n\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `prompt` | str | The prompt to send to the LLM along with the `col` | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples\n\n::: {#75e71ed7 .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n:::\n:::\n\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#ae6b4d4c .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#1602b38d .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n
reviewprod
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#f257bda7 .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv | great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop|frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine | confusion"
\n```\n:::\n:::\n\n\n::: {#d90b2af3 .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n
reviewproductfeelings
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n::: {#370ba370 .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n
reviewprodfeels
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `options` | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#10220325 .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#d6195d1c .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n
reviewreview_sentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#8d3b25b4 .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""negative"
\n```\n:::\n:::\n\n\n::: {#2b51b9ef .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : \"1\", \"negative\" : \"0\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""0"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""0"
\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarize the text down to a specific number of words.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `max_words` | int | Maximum number of words to use for the summary | `10` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#97fb9a3a .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n::: {#f89461d7 .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n
reviewreview_summary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `language` | str | The target language to translate to. For example 'French'. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#bf769707 .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n:::\n:::\n\n\n::: {#24cae7a4 .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre…
\n```\n:::\n:::\n\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend` | str | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| `model` | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| `_cache` | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| `**kwargs` | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples\n\n::: {#d1773d8b .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.1}\n```\n:::\n:::\n\n\n::: {#0a90e5ca .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(temp = 0.3)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#7009a892 .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_my_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#87460892 .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and\nset up the data frame that will be used in the\nexamples:\n\n\n::: {#e0baad23 .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100))\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n| [verify](#mall.MallFrame.verify) | Check to see if something is true about the text. |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#c433ce08 .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#cda91b85 .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n
reviewprod_type
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n:::\n:::\n\n\n::: {#f6d7e2c6 .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""2"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""1"
\n```\n:::\n:::\n\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `prompt` | str | The prompt to send to the LLM along with the `col` | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples\n\n::: {#2c633a89 .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n:::\n:::\n\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#11a96b13 .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#33a564f6 .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n
reviewprod
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#29bc70bf .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv | great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop|frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine | confusion"
\n```\n:::\n:::\n\n\n::: {#35587a7e .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n
reviewproductfeelings
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n::: {#bc1572b9 .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n
reviewprodfeels
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine "" confusion"
\n```\n:::\n:::\n\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `options` | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#16b56226 .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#082d1ef7 .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n
reviewreview_sentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n:::\n:::\n\n\n::: {#0f2f7a13 .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""negative"
\n```\n:::\n:::\n\n\n::: {#7bb697be .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : \"1\", \"negative\" : \"0\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""0"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""0"
\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarize the text down to a specific number of words.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `max_words` | int | Maximum number of words to use for the summary | `10` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#2690ac20 .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n::: {#62f13bf2 .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n
reviewreview_summary
"This has been the best TV I've ever used. Great screen, and sound.""great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""feeling uncertain about new purchase"
\n```\n:::\n:::\n\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `language` | str | The target language to translate to. For example 'French'. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#a4d7ae95 .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
\n```\n:::\n:::\n\n\n::: {#df4fb9ee .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre…
\n```\n:::\n:::\n\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend` | str | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| `model` | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| `_cache` | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| `**kwargs` | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples\n\n::: {#8ac89991 .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", seed = 100, temp = 0.1)\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.1}\n```\n:::\n:::\n\n\n::: {#ee435769 .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(temp = 0.3)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#266e2cb3 .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_my_cache',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n::: {#ab07df94 .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '',\n 'options': {'seed': 100},\n 'seed': 100,\n 'temp': 0.3}\n```\n:::\n:::\n\n\n### verify { #mall.MallFrame.verify }\n\n`MallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')`\n\nCheck to see if something is true about the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `what` | str | The statement or question that needs to be verified against the provided text | `''` |\n| `yes_no` | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n::: {#3f2cbfdf .cell execution_count=23}\n``` {.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy\")\n```\n\n::: {.cell-output .cell-output-display execution_count=23}\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound."1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"0
\n```\n:::\n:::\n\n\n::: {#4899b7b6 .cell execution_count=24}\n``` {.python .cell-code}\n# Use 'yes_no' to modify the 'true' and 'false' values to return\nreviews.llm.verify(\"review\", \"is the customer happy\", [\"y\", \"n\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound.""y"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""n"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""n"
\n```\n:::\n:::\n\n\n", "supporting": [ "MallFrame_files" ], diff --git a/_freeze/reference/llm_verify/execute-results/html.json b/_freeze/reference/llm_verify/execute-results/html.json new file mode 100644 index 0000000..be93f60 --- /dev/null +++ b/_freeze/reference/llm_verify/execute-results/html.json @@ -0,0 +1,15 @@ +{ + "hash": "07283324dfba84486e7dfa2870efbcb4", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: \"Verify if a statement about the text is true or not\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n[R/llm-verify.R](https://github.com/mlverse/mall/blob/main/R/llm-verify.R)\n\n## llm_verify\n\n## Description\n Use a Large Language Model (LLM) to see if something is true or not based the provided text \n\n\n## Usage\n```r\n \nllm_verify( \n .data, \n col, \n what, \n yes_no = factor(c(1, 0)), \n pred_name = \".verify\", \n additional_prompt = \"\" \n) \n \nllm_vec_verify( \n x, \n what, \n yes_no = factor(c(1, 0)), \n additional_prompt = \"\", \n preview = FALSE \n) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| what | The statement or question that needs to be verified against the provided text |\n| yes_no | A size 2 vector that specifies the expected output. It is positional. The first item is expected to be value to return if the statement about the provided text is true, and the second if it is not. Defaults to: `factor(c(1, 0))` |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_verify` returns a `data.frame` or `tbl` object. `llm_vec_verify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \n# By default it will return 1 for 'true', and 0 for 'false', \n# the new column will be a factor type \nllm_verify(reviews, review, \"is the customer happy\") \n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1 \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 0 \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 0\n \n# The yes_no argument can be modified to return a different response \n# than 1 or 0. First position will be 'true' and second, 'false' \nllm_verify(reviews, review, \"is the customer happy\", c(\"y\", \"n\")) \n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. y \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… n \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… n\n \n# Number can also be used, this would be in the case that you wish to match \n# the output values of existing predictions \nllm_verify(reviews, review, \"is the customer happy\", c(2, 1)) \n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 2\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 1\n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 1\n```\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/index.qmd b/index.qmd index 65ce486..57bf834 100644 --- a/index.qmd +++ b/index.qmd @@ -44,6 +44,7 @@ Currently, the included prompts perform the following: - [Classify text](#classify) - [Extract one, or several](#extract), specific pieces information from the text - [Translate text](#translate) +- [Verify that something it true](#verify) about the text (binary) - [Custom prompt](#custom-prompt) This package is inspired by the SQL AI functions now offered by vendors such as @@ -298,6 +299,63 @@ For more information and examples visit this function's ::: +### Classify + +Use the LLM to categorize the text into one of the options you provide: + + +::: {.panel-tabset group="language"} +## R + +```{r} +reviews |> + llm_classify(review, c("appliance", "computer")) +``` + +For more information and examples visit this function's +[R reference page](reference/llm_classify.qmd) + +## Python + +```{python} +reviews.llm.classify("review", ["computer", "appliance"]) +``` + +For more information and examples visit this function's +[Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify) + +::: + +### Verify + +This functions allows you to check and see if a statement is true, based +on the provided text. By default, it will return a 1 for "yes", and 0 for +"no". This can be customized. + + +::: {.panel-tabset group="language"} +## R + +```{r} +reviews |> + llm_verify(review, "is the customer happy with the purchase") +``` + +For more information and examples visit this function's +[R reference page](reference/llm_verify.qmd) + +## Python + +```{python} +reviews.llm.verify("review", "is the customer happy with the purchase") +``` + +For more information and examples visit this function's +[Python reference page](reference/MallFrame.qmd#mall.MallFrame.verify) + +::: + + ### Translate diff --git a/objects.json b/objects.json index ab961e6..00ec9f7 100644 --- a/objects.json +++ b/objects.json @@ -1 +1 @@ -{"project": "mall", "version": "0.0.9999", "count": 16, "items": [{"name": "mall.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "-"}, {"name": "mall.polars.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "mall.MallFrame.classify"}, {"name": "mall.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "-"}, {"name": "mall.polars.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "mall.MallFrame.custom"}, {"name": "mall.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "-"}, {"name": "mall.polars.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "mall.MallFrame.extract"}, {"name": "mall.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "-"}, {"name": "mall.polars.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "mall.MallFrame.sentiment"}, {"name": "mall.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "-"}, {"name": "mall.polars.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "mall.MallFrame.summarize"}, {"name": "mall.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "-"}, {"name": "mall.polars.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "mall.MallFrame.translate"}, {"name": "mall.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "-"}, {"name": "mall.polars.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "mall.MallFrame.use"}, {"name": "mall.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "-"}, {"name": "mall.polars.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "mall.MallFrame"}]} \ No newline at end of file +{"project": "mall", "version": "0.0.9999", "count": 18, "items": [{"name": "mall.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "-"}, {"name": "mall.polars.MallFrame.classify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.classify", "dispname": "mall.MallFrame.classify"}, {"name": "mall.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "-"}, {"name": "mall.polars.MallFrame.custom", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.custom", "dispname": "mall.MallFrame.custom"}, {"name": "mall.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "-"}, {"name": "mall.polars.MallFrame.extract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.extract", "dispname": "mall.MallFrame.extract"}, {"name": "mall.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "-"}, {"name": "mall.polars.MallFrame.sentiment", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.sentiment", "dispname": "mall.MallFrame.sentiment"}, {"name": "mall.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "-"}, {"name": "mall.polars.MallFrame.summarize", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.summarize", "dispname": "mall.MallFrame.summarize"}, {"name": "mall.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "-"}, {"name": "mall.polars.MallFrame.translate", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.translate", "dispname": "mall.MallFrame.translate"}, {"name": "mall.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "-"}, {"name": "mall.polars.MallFrame.use", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.use", "dispname": "mall.MallFrame.use"}, {"name": "mall.MallFrame.verify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.verify", "dispname": "-"}, {"name": "mall.polars.MallFrame.verify", "domain": "py", "role": "function", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame.verify", "dispname": "mall.MallFrame.verify"}, {"name": "mall.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "-"}, {"name": "mall.polars.MallFrame", "domain": "py", "role": "class", "priority": "1", "uri": "reference/MallFrame.html#mall.MallFrame", "dispname": "mall.MallFrame"}]} \ No newline at end of file diff --git a/python/README.md b/python/README.md new file mode 100644 index 0000000..18194a4 --- /dev/null +++ b/python/README.md @@ -0,0 +1,100 @@ +# mall + +## Intro + +Run multiple LLM predictions against a data frame. The predictions are +processed row-wise over a specified column. It works using a +pre-determined one-shot prompt, along with the current row’s content. + +## Install + +To install from Github, use: + +``` python +pip install "mall @ git+https://git@github.com/edgararuiz/mall.git@python#subdirectory=python" +``` + +## Examples + +``` python +import mall +import polars as pl + +reviews = pl.DataFrame( + data=[ + "This has been the best TV I've ever used. Great screen, and sound.", + "I regret buying this laptop. It is too slow and the keyboard is too noisy", + "Not sure how to feel about my new washing machine. Great color, but hard to figure" + ], + schema=[("review", pl.String)], +) +``` + +## Sentiment + + +``` python +reviews.llm.sentiment("review") +``` + +shape: (3, 2) + +| review | sentiment | +|----------------------------------|------------| +| str | str | +| "This has been the best TV I've… | "positive" | +| "I regret buying this laptop. I… | "negative" | +| "Not sure how to feel about my … | "neutral" | + +## Summarize + +``` python +reviews.llm.summarize("review", 5) +``` + +shape: (3, 2) + +| review | summary | +|----------------------------------|----------------------------------| +| str | str | +| "This has been the best TV I've… | "it's a great tv" | +| "I regret buying this laptop. I… | "laptop not worth the money" | +| "Not sure how to feel about my … | "feeling uncertain about new pu… | + +## Translate (as in ‘English to French’) + +``` python +reviews.llm.translate("review", "spanish") +``` + +shape: (3, 2) + +| review | translation | +|----------------------------------|----------------------------------| +| str | str | +| "This has been the best TV I've… | "Esta ha sido la mejor TV que h… | +| "I regret buying this laptop. I… | "Lo lamento comprar este portát… | +| "Not sure how to feel about my … | "No estoy seguro de cómo sentir… | + +## Classify + +``` python +reviews.llm.classify("review", ["computer", "appliance"]) +``` + +shape: (3, 2) + +| review | classify | +|----------------------------------|-------------| +| str | str | +| "This has been the best TV I've… | "appliance" | +| "I regret buying this laptop. I… | "appliance" | +| "Not sure how to feel about my … | "appliance" | + +## LLM session setup + +``` python +reviews.llm.use(options = dict(seed = 100)) +``` + + {'backend': 'ollama', 'model': 'llama3.2', 'options': {'seed': 100}} diff --git a/python/README.qmd b/python/README.qmd new file mode 100644 index 0000000..862f56a --- /dev/null +++ b/python/README.qmd @@ -0,0 +1,71 @@ +--- +format: gfm +--- + +# mall + +## Intro + +Run multiple LLM predictions against a data frame. The predictions are processed row-wise over a specified column. It works using a pre-determined one-shot prompt, along with the current row’s content. + +## Install + +To install from Github, use: + +```python +pip install "mall @ git+https://git@github.com/edgararuiz/mall.git@python#subdirectory=python" +``` + +## Examples + +```{python} +#| include: false +import polars as pl +from polars.dataframe._html import HTMLFormatter +html_formatter = get_ipython().display_formatter.formatters['text/html'] +html_formatter.for_type(pl.DataFrame, lambda df: "\n".join(HTMLFormatter(df).render())) +``` + + +```{python} +import mall +import polars as pl +data = mall.MallData +reviews = data.reviews +``` + +```{python} +#| include: false +reviews.llm.use(options = dict(seed = 100)) +``` + + +## Sentiment + +```{python} +reviews.llm.sentiment("review") +``` + +## Summarize + +```{python} +reviews.llm.summarize("review", 5) +``` + +## Translate (as in 'English to French') + +```{python} +reviews.llm.translate("review", "spanish") +``` + +## Classify + +```{python} +reviews.llm.classify("review", ["computer", "appliance"]) +``` + +## LLM session setup + +```{python} +reviews.llm.use(options = dict(seed = 100)) +``` diff --git a/python/mall/llm.py b/python/mall/llm.py index dc89ee3..4ed8929 100644 --- a/python/mall/llm.py +++ b/python/mall/llm.py @@ -1,17 +1,45 @@ +import polars as pl import ollama import json import hashlib import os -def build_msg(x, msg): - out = [] - for msgs in msg: - out.append({"role": msgs["role"], "content": msgs["content"].format(x)}) - return out +def map_call(df, col, msg, pred_name, use, valid_resps="", convert=None): + if valid_resps == "": + valid_resps = [] + valid_resps = valid_output(valid_resps) + ints = 0 + for resp in valid_resps: + ints = ints + isinstance(resp, int) + + pl_type = pl.String + data_type = str + + if len(valid_resps) == ints & ints != 0: + pl_type = pl.Int8 + data_type = int + + df = df.with_columns( + pl.col(col) + .map_elements( + lambda x: llm_call( + x=x, + msg=msg, + use=use, + preview=False, + valid_resps=valid_resps, + convert=convert, + data_type=data_type, + ), + return_dtype=pl_type, + ) + .alias(pred_name) + ) + return df -def llm_call(x, msg, use, preview=False, valid_resps=""): +def llm_call(x, msg, use, preview=False, valid_resps="", convert=None, data_type=None): call = dict( model=use.get("model"), @@ -41,9 +69,33 @@ def llm_call(x, msg, use, preview=False, valid_resps=""): if cache == "": cache_record(hash_call, use, call, out) - if isinstance(valid_resps, list): - if out not in valid_resps: - out = None + if isinstance(convert, dict): + for label in convert: + if out == label: + out = convert.get(label) + + # out = data_type(out) + + # if out not in valid_resps: + # out = None + + return out + + +def valid_output(x): + out = [] + if isinstance(x, list): + out = x + if isinstance(x, dict): + for i in x: + out.append(x.get(i)) + return out + + +def build_msg(x, msg): + out = [] + for msgs in msg: + out.append({"role": msgs["role"], "content": msgs["content"].format(x)}) return out diff --git a/python/mall/polars.py b/python/mall/polars.py index 20e035c..4a7cfd9 100644 --- a/python/mall/polars.py +++ b/python/mall/polars.py @@ -1,6 +1,15 @@ import polars as pl -from mall.prompt import sentiment, summarize, translate, classify, extract, custom -from mall.llm import llm_call + +from mall.prompt import ( + sentiment, + summarize, + translate, + classify, + extract, + custom, + verify, +) +from mall.llm import map_call @pl.api.register_dataframe_namespace("llm") @@ -8,8 +17,8 @@ class MallFrame: """Extension to Polars that add ability to use an LLM to run batch predictions over a data frame - We will start by loading the needed libraries, and - set up the data frame that will be used in the + We will start by loading the needed libraries, and + set up the data frame that will be used in the examples: ```{python} @@ -423,14 +432,56 @@ def custom( ) return df + def verify( + self, + col, + what="", + yes_no=[1, 0], + additional="", + pred_name="verify", + ) -> list[pl.DataFrame]: + """Check to see if something is true about the text. + + Parameters + ------ + col : str + The name of the text field to process + + what : str + The statement or question that needs to be verified against the + provided text + + yes_no : list + A positional list of size 2, which contains the values to return + if true and false. The first position will be used as the 'true' + value, and the second as the 'false' value + + pred_name : str + A character vector with the name of the new column where the + prediction will be placed + + additional : str + Inserts this text into the prompt sent to the LLM + + Examples + ------ -def map_call(df, col, msg, pred_name, use, valid_resps=""): - df = df.with_columns( - pl.col(col) - .map_elements( - lambda x: llm_call(x, msg, use, False, valid_resps), - return_dtype=pl.String, + ```{python} + reviews.llm.verify("review", "is the customer happy") + ``` + + ```{python} + # Use 'yes_no' to modify the 'true' and 'false' values to return + reviews.llm.verify("review", "is the customer happy", ["y", "n"]) + ``` + """ + df = map_call( + df=self._df, + col=col, + msg=verify(what, additional=additional), + pred_name=pred_name, + use=self._use, + valid_resps=yes_no, + convert=dict(yes=yes_no[0], no=yes_no[1]), ) - .alias(pred_name) - ) - return df + return df diff --git a/python/mall/prompt.py b/python/mall/prompt.py index c73317f..d477813 100644 --- a/python/mall/prompt.py +++ b/python/mall/prompt.py @@ -71,7 +71,9 @@ def extract(labels, additional=""): if isinstance(labels, list): no_labels = len(labels) plural = "s" - text_multi = "Return the response exclusively in a pipe separated list, and no headers. " + text_multi = ( + "Return the response exclusively in a pipe separated list, and no headers. " + ) for label in labels: col_labels += label + " " col_labels = col_labels.rstrip() @@ -97,6 +99,21 @@ def extract(labels, additional=""): return msg +def verify(what, additional=""): + msg = [ + { + "role": "user", + "content": "You are a helpful text analysis engine. " + + "Determine this is true " + + f"'{what}'." + + "No capitalization. No explanations. " + + f"{additional} " + + "The answer is based on the following text:\n{}", + } + ] + return msg + + def custom(prompt): msg = [{"role": "user", "content": f"{prompt}" + ": \n{}"}] return msg @@ -109,12 +126,12 @@ def process_labels(x, if_list="", if_dict=""): out += " " + i out = out.strip() out = out.replace(" ", ", ") - out = if_list.replace("{values}", out) + out = if_list.replace("{values}", str(out)) if isinstance(x, dict): out = "" for i in x: new = if_dict new = new.replace("{key}", i) - new = new.replace("{value}", x.get(i)) + new = new.replace("{value}", str(x.get(i))) out += " " + new return out diff --git a/r/DESCRIPTION b/r/DESCRIPTION index 717d5f5..afba4e6 100644 --- a/r/DESCRIPTION +++ b/r/DESCRIPTION @@ -1,7 +1,7 @@ Package: mall Title: Run multiple 'Large Language Model' predictions against a table, or vectors -Version: 0.0.0.9006 +Version: 0.0.0.9007 Authors@R: person("Edgar", "Ruiz", , "first.last@example.com", role = c("aut", "cre")) Description: Run multiple 'Large Language Model' predictions against a table. The diff --git a/r/NAMESPACE b/r/NAMESPACE index 93c2220..36a884d 100644 --- a/r/NAMESPACE +++ b/r/NAMESPACE @@ -9,6 +9,7 @@ S3method(llm_sentiment,data.frame) S3method(llm_summarize,"tbl_Spark SQL") S3method(llm_summarize,data.frame) S3method(llm_translate,data.frame) +S3method(llm_verify,data.frame) S3method(m_backend_prompt,mall_llama3.2) S3method(m_backend_prompt,mall_session) S3method(m_backend_submit,mall_ollama) @@ -27,6 +28,8 @@ export(llm_vec_extract) export(llm_vec_sentiment) export(llm_vec_summarize) export(llm_vec_translate) +export(llm_vec_verify) +export(llm_verify) export(m_backend_prompt) export(m_backend_submit) import(cli) diff --git a/r/R/llm-verify.R b/r/R/llm-verify.R new file mode 100644 index 0000000..9d66f88 --- /dev/null +++ b/r/R/llm-verify.R @@ -0,0 +1,82 @@ +#' Verify if a statement about the text is true or not +#' @description +#' Use a Large Language Model (LLM) to see if something is true or not +#' based the provided text +#' +#' @inheritParams llm_classify +#' @param what The statement or question that needs to be verified against the +#' provided text +#' @param yes_no A size 2 vector that specifies the expected output. It is +#' positional. The first item is expected to be value to return if the +#' statement about the provided text is true, and the second if it is not. Defaults +#' to: `factor(c(1, 0))` +#' @returns `llm_verify` returns a `data.frame` or `tbl` object. +#' `llm_vec_verify` returns a vector that is the same length as `x`. +#' +#' @examples +#' \dontrun{ +#' library(mall) +#' +#' data("reviews") +#' +#' llm_use("ollama", "llama3.2", seed = 100, .silent = TRUE) +#' +#' # By default it will return 1 for 'true', and 0 for 'false', +#' # the new column will be a factor type +#' llm_verify(reviews, review, "is the customer happy") +#' +#' # The yes_no argument can be modified to return a different response +#' # than 1 or 0. First position will be 'true' and second, 'false' +#' llm_verify(reviews, review, "is the customer happy", c("y", "n")) +#' +#' # Number can also be used, this would be in the case that you wish to match +#' # the output values of existing predictions +#' llm_verify(reviews, review, "is the customer happy", c(2, 1)) +#' } +#' +#' @export +llm_verify <- function(.data, + col, + what, + yes_no = factor(c(1, 0)), + pred_name = ".verify", + additional_prompt = "") { + UseMethod("llm_verify") +} + +#' @export +llm_verify.data.frame <- function(.data, + col, + what, + yes_no = factor(c(1, 0)), + pred_name = ".verify", + additional_prompt = "") { + mutate( + .data = .data, + !!pred_name := llm_vec_verify( + x = {{ col }}, + what = what, + yes_no = yes_no, + additional_prompt = additional_prompt + ) + ) +} + +#' @rdname llm_verify +#' @export +llm_vec_verify <- function(x, + what, + yes_no = factor(c(1, 0)), + additional_prompt = "", + preview = FALSE) { + m_vec_prompt( + x = x, + prompt_label = "verify", + what = what, + labels = yes_no, + valid_resps = yes_no, + convert = c("yes" = yes_no[1], "no" = yes_no[2]), + additional_prompt = additional_prompt, + preview = preview + ) +} diff --git a/r/R/m-backend-prompt.R b/r/R/m-backend-prompt.R index 915e7db..b238525 100644 --- a/r/R/m-backend-prompt.R +++ b/r/R/m-backend-prompt.R @@ -145,6 +145,21 @@ m_backend_prompt.mall_session <- function(backend, additional = "") { )) ) ) + }, + verify = function(what, labels) { + list( + list( + role = "user", + content = glue(paste( + "You are a helpful text analysis engine.", + "Determine this is true ", + "'{what}'.", + "No capitalization. No explanations.", + "{additional}", + "The answer is based on the following text:\n{{x}}" + )) + ) + ) } ) } diff --git a/r/R/m-vec-prompt.R b/r/R/m-vec-prompt.R index 696971e..a96b4ca 100644 --- a/r/R/m-vec-prompt.R +++ b/r/R/m-vec-prompt.R @@ -3,6 +3,7 @@ m_vec_prompt <- function(x, additional_prompt = "", valid_resps = NULL, prompt = NULL, + convert = NULL, preview = FALSE, ...) { # Initializes session LLM @@ -40,10 +41,18 @@ m_vec_prompt <- function(x, if (preview) { return(resp[[1]]) } + # Checks for invalid output and marks them as NA if (all_formula(valid_resps)) { valid_resps <- list_c(map(valid_resps, f_rhs)) } + + if (!is.null(convert)) { + for (i in seq_along(convert)) { + resp[resp == names(convert[i])] <- as.character(convert[[i]]) + } + } + if (!is.null(valid_resps)) { errors <- !resp %in% valid_resps resp[errors] <- NA @@ -56,8 +65,12 @@ m_vec_prompt <- function(x, ) } } + if (is.numeric(valid_resps)) { resp <- as.numeric(resp) } + if (is.factor(valid_resps)) { + resp <- as.factor(resp) + } resp } diff --git a/r/man/llm_verify.Rd b/r/man/llm_verify.Rd new file mode 100644 index 0000000..985c484 --- /dev/null +++ b/r/man/llm_verify.Rd @@ -0,0 +1,79 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/llm-verify.R +\name{llm_verify} +\alias{llm_verify} +\alias{llm_vec_verify} +\title{Verify if a statement about the text is true or not} +\usage{ +llm_verify( + .data, + col, + what, + yes_no = factor(c(1, 0)), + pred_name = ".verify", + additional_prompt = "" +) + +llm_vec_verify( + x, + what, + yes_no = factor(c(1, 0)), + additional_prompt = "", + preview = FALSE +) +} +\arguments{ +\item{.data}{A \code{data.frame} or \code{tbl} object that contains the text to be +analyzed} + +\item{col}{The name of the field to analyze, supports \code{tidy-eval}} + +\item{what}{The statement or question that needs to be verified against the +provided text} + +\item{yes_no}{A size 2 vector that specifies the expected output. It is +positional. The first item is expected to be value to return if the +statement about the provided text is true, and the second if it is not. Defaults +to: \code{factor(c(1, 0))}} + +\item{pred_name}{A character vector with the name of the new column where the +prediction will be placed} + +\item{additional_prompt}{Inserts this text into the prompt sent to the LLM} + +\item{x}{A vector that contains the text to be analyzed} + +\item{preview}{It returns the R call that would have been used to run the +prediction. It only returns the first record in \code{x}. Defaults to \code{FALSE} +Applies to vector function only.} +} +\value{ +\code{llm_verify} returns a \code{data.frame} or \code{tbl} object. +\code{llm_vec_verify} returns a vector that is the same length as \code{x}. +} +\description{ +Use a Large Language Model (LLM) to see if something is true or not +based the provided text +} +\examples{ +\dontrun{ +library(mall) + +data("reviews") + +llm_use("ollama", "llama3.2", seed = 100, .silent = TRUE) + +# By default it will return 1 for 'true', and 0 for 'false', +# the new column will be a factor type +llm_verify(reviews, review, "is the customer happy") + +# The yes_no argument can be modified to return a different response +# than 1 or 0. First position will be 'true' and second, 'false' +llm_verify(reviews, review, "is the customer happy", c("y", "n")) + +# Number can also be used, this would be in the case that you wish to match +# the output values of existing predictions +llm_verify(reviews, review, "is the customer happy", c(2, 1)) +} + +} diff --git a/r/tests/testthat/_snaps/llm-verify.md b/r/tests/testthat/_snaps/llm-verify.md new file mode 100644 index 0000000..b308dce --- /dev/null +++ b/r/tests/testthat/_snaps/llm-verify.md @@ -0,0 +1,36 @@ +# Preview works + + Code + llm_vec_verify("this is a test", "a test", preview = TRUE) + Output + ollamar::chat(messages = list(list(role = "user", content = "You are a helpful text analysis engine. Determine this is true 'a test'. No capitalization. No explanations. The answer is based on the following text:\nthis is a test")), + output = "text", model = "llama3.2", seed = 100) + +# Verify on Ollama works + + Code + llm_verify(reviews, review, "is the customer happy") + Output + review + 1 This has been the best TV I've ever used. Great screen, and sound. + 2 I regret buying this laptop. It is too slow and the keyboard is too noisy + 3 Not sure how to feel about my new washing machine. Great color, but hard to figure + .verify + 1 1 + 2 0 + 3 0 + +--- + + Code + llm_verify(reviews, review, "is the customer happy", yes_no = c("y", "n")) + Output + review + 1 This has been the best TV I've ever used. Great screen, and sound. + 2 I regret buying this laptop. It is too slow and the keyboard is too noisy + 3 Not sure how to feel about my new washing machine. Great color, but hard to figure + .verify + 1 y + 2 n + 3 n + diff --git a/r/tests/testthat/_snaps/zzz-cache.md b/r/tests/testthat/_snaps/zzz-cache.md index c5111c7..88b7d02 100644 --- a/r/tests/testthat/_snaps/zzz-cache.md +++ b/r/tests/testthat/_snaps/zzz-cache.md @@ -29,4 +29,8 @@ _mall_cache/b0/b02d0fab954e183a98787fa897b47d59.json _mall_cache/b7 _mall_cache/b7/b7c613386c94b2500b2b733632fedd1a.json + _mall_cache/b9 + _mall_cache/b9/b9f544fe374b34f5b2320ac3a8c2847f.json + _mall_cache/dd + _mall_cache/dd/dd074f573d5fe67c1a5f27f63fa06267.json diff --git a/r/tests/testthat/test-llm-verify.R b/r/tests/testthat/test-llm-verify.R new file mode 100644 index 0000000..bdec03c --- /dev/null +++ b/r/tests/testthat/test-llm-verify.R @@ -0,0 +1,60 @@ +test_that("Verify works", { + test_text <- "this is a test" + llm_use("simulate_llm", "echo", .silent = TRUE, .force = TRUE) + expect_equal( + llm_vec_verify(test_text, "test", yes_no = test_text), + test_text + ) + expect_equal( + llm_vec_verify(0, "question", factor(0, 0)), + as.factor(0) + ) + expect_message( + x <- llm_vec_verify(test_text, "test", yes_no = "different test") + ) + expect_equal(x, as.character(NA)) + + expect_equal( + llm_verify(data.frame(x = test_text), x, "test", yes_no = test_text), + data.frame(x = test_text, .verify = test_text) + ) + + expect_equal( + llm_verify( + data.frame(x = test_text), + x, + what = "test", + yes_no = test_text, + pred_name = "new" + ), + data.frame(x = test_text, new = test_text) + ) +}) + +test_that("Preview works", { + llm_use("ollama", "llama3.2", seed = 100, .silent = TRUE) + expect_snapshot( + llm_vec_verify("this is a test", "a test", preview = TRUE) + ) +}) + +test_that("Verify on Ollama works", { + skip_if_no_ollama() + reviews <- reviews_table() + expect_snapshot( + llm_verify( + reviews, + review, + "is the customer happy" + ) + ) + reviews <- reviews_table() + expect_snapshot( + llm_verify( + reviews, + review, + "is the customer happy", + yes_no = c("y", "n") + ) + ) +}) diff --git a/reference/MallFrame.qmd b/reference/MallFrame.qmd index 0cf9d8c..e11b8b3 100644 --- a/reference/MallFrame.qmd +++ b/reference/MallFrame.qmd @@ -5,8 +5,8 @@ Extension to Polars that add ability to use an LLM to run batch predictions over a data frame -We will start by loading the needed libraries, and -set up the data frame that will be used in the +We will start by loading the needed libraries, and +set up the data frame that will be used in the examples: ```{python} @@ -32,6 +32,7 @@ reviews.llm.use(options = dict(seed = 100)) | [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. | | [translate](#mall.MallFrame.translate) | Translate text into another language. | | [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to | +| [verify](#mall.MallFrame.verify) | Check to see if something is true about the text. | ### classify { #mall.MallFrame.classify } @@ -270,4 +271,31 @@ reviews.llm.use(_cache = "_my_cache") ```{python} # Leave _cache empty to turn off this functionality reviews.llm.use(_cache = "") +``` + +### verify { #mall.MallFrame.verify } + +`MallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')` + +Check to see if something is true about the text. + +#### Parameters + +| Name | Type | Description | Default | +|--------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| `col` | str | The name of the text field to process | _required_ | +| `what` | str | The statement or question that needs to be verified against the provided text | `''` | +| `yes_no` | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` | +| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` | +| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` | + +#### Examples + +```{python} +reviews.llm.verify("review", "is the customer happy") +``` + +```{python} +# Use 'yes_no' to modify the 'true' and 'false' values to return +reviews.llm.verify("review", "is the customer happy", ["y", "n"]) ``` \ No newline at end of file diff --git a/reference/index.qmd b/reference/index.qmd index 1f1f689..f366929 100644 --- a/reference/index.qmd +++ b/reference/index.qmd @@ -20,6 +20,7 @@ an LLM to run batch predictions over a data frame | [summarize](MallFrame.qmd#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. | | [translate](MallFrame.qmd#mall.MallFrame.translate) | Translate text into another language. | | [use](MallFrame.qmd#mall.MallFrame.use) | Define the model, backend, and other options to use to | +| [verify](MallFrame.qmd#mall.MallFrame.verify) | Check to see if something is true about the text. | ::: diff --git a/reference/llm_verify.qmd b/reference/llm_verify.qmd new file mode 100644 index 0000000..3a5424e --- /dev/null +++ b/reference/llm_verify.qmd @@ -0,0 +1,84 @@ +--- +title: "Verify if a statement about the text is true or not" +execute: + eval: true + freeze: true +--- + +```{r} +#| include: false +source("../site/knitr-print.R") +``` + +[R/llm-verify.R](https://github.com/mlverse/mall/blob/main/R/llm-verify.R) + +## llm_verify + +## Description + Use a Large Language Model (LLM) to see if something is true or not based the provided text + + +## Usage +```r + +llm_verify( + .data, + col, + what, + yes_no = factor(c(1, 0)), + pred_name = ".verify", + additional_prompt = "" +) + +llm_vec_verify( + x, + what, + yes_no = factor(c(1, 0)), + additional_prompt = "", + preview = FALSE +) +``` + +## Arguments +|Arguments|Description| +|---|---| +| .data | A `data.frame` or `tbl` object that contains the text to be analyzed | +| col | The name of the field to analyze, supports `tidy-eval` | +| what | The statement or question that needs to be verified against the provided text | +| yes_no | A size 2 vector that specifies the expected output. It is positional. The first item is expected to be value to return if the statement about the provided text is true, and the second if it is not. Defaults to: `factor(c(1, 0))` | +| pred_name | A character vector with the name of the new column where the prediction will be placed | +| additional_prompt | Inserts this text into the prompt sent to the LLM | +| x | A vector that contains the text to be analyzed | +| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. | + + + +## Value + `llm_verify` returns a `data.frame` or `tbl` object. `llm_vec_verify` returns a vector that is the same length as `x`. + + +## Examples +```{r} + +library(mall) + +data("reviews") + +llm_use("ollama", "llama3.2", seed = 100, .silent = TRUE) + +# By default it will return 1 for 'true', and 0 for 'false', +# the new column will be a factor type +llm_verify(reviews, review, "is the customer happy") + +# The yes_no argument can be modified to return a different response +# than 1 or 0. First position will be 'true' and second, 'false' +llm_verify(reviews, review, "is the customer happy", c("y", "n")) + +# Number can also be used, this would be in the case that you wish to match +# the output values of existing predictions +llm_verify(reviews, review, "is the customer happy", c(2, 1)) + + +``` + + diff --git a/reference/r_index.qmd b/reference/r_index.qmd index 40287d9..3d44b3e 100644 --- a/reference/r_index.qmd +++ b/reference/r_index.qmd @@ -38,6 +38,11 @@ toc: false       Specify the model to use +[llm_verify()](llm_verify.html) [llm_vec_verify()](llm_verify.html) + + +      Verify if a statement about the text is true or not + [reviews](reviews.html)