Merge pull request #20 from edgararuiz/updates

Updates
mlverse · Oct 9, 2024 · bb79109 · bb79109
2 parents 606ded8 + 94c381b
commit bb79109
Show file tree

Hide file tree

Showing 22 changed files with 378 additions and 88 deletions.
diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json
diff --git a/_freeze/reference/MallFrame/execute-results/html.json b/_freeze/reference/MallFrame/execute-results/html.json
@@ -0,0 +1,16 @@
+{
+  "hash": "5899b6a791e9901601e683a4446adf9a",
+  "result": {
+    "engine": "jupyter",
+    "markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarise the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name         | Type   | Description                                                                                                             | Default      |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col`        | str    | The name of the text field to process                                                                                   | _required_   |\n| `labels`     | list   | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''`         |\n| `pred_name`  | str    | A character vector with the name of the new column where the prediction will be placed                                  | `'classify'` |\n| `additional` | str    | Inserts this text into the prompt sent to the LLM                                                                       | `''`         |\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name        | Type   | Description                                                                            | Default    |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col`       | str    | The name of the text field to process                                                  | _required_ |\n| `prompt`    | str    | The prompt to send to the LLM along with the `col`                                     | `''`       |\n| `pred_name` | str    | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name         | Type   | Description                                                                            | Default     |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col`        | str    | The name of the text field to process                                                  | _required_  |\n| `labels`     | list   | A list or a DICT object that defines tells the LLM what to look for and return         | `''`        |\n| `pred_name`  | str    | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str    | Inserts this text into the prompt sent to the LLM                                      | `''`        |\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name         | Type         | Description                                                                            | Default                               |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col`        | str          | The name of the text field to process                                                  | _required_                            |\n| `options`    | list or dict | A list of the sentiment options to use, or a named DICT object                         | `['positive', 'negative', 'neutral']` |\n| `pred_name`  | str          | A character vector with the name of the new column where the prediction will be placed | `'sentiment'`                         |\n| `additional` | str          | Inserts this text into the prompt sent to the LLM                                      | `''`                                  |\n\n#### Examples\n\n\n::: {#d67b08f2 .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100), _cache = \"_readme_cache\")\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n<div><style>\n.dataframe > thead > tr,\n.dataframe > tbody > tr {\n  text-align: right;\n  white-space: pre-wrap;\n}\n</style>\n<small>shape: (3, 2)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>review</th><th>sentiment</th></tr><tr><td>str</td><td>str</td></tr></thead><tbody><tr><td>&quot;This has been the best TV I&#x27;ve…</td><td>&quot;positive&quot;</td></tr><tr><td>&quot;I regret buying this laptop. I…</td><td>&quot;negative&quot;</td></tr><tr><td>&quot;Not sure how to feel about my …</td><td>&quot;neutral&quot;</td></tr></tbody></table></div>\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarise the text down to a specific number of words.\n\n#### Parameters\n\n| Name         | Type   | Description                                                                            | Default     |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col`        | str    | The name of the text field to process                                                  | _required_  |\n| `max_words`  | int    | Maximum number of words to use for the summary                                         | `10`        |\n| `pred_name`  | str    | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str    | Inserts this text into the prompt sent to the LLM                                      | `''`        |\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name         | Type   | Description                                                                            | Default         |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col`        | str    | The name of the text field to process                                                  | _required_      |\n| `language`   | str    | The target language to translate to. For example 'French'.                             | `''`            |\n| `pred_name`  | str    | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str    | Inserts this text into the prompt sent to the LLM                                      | `''`            |\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name       | Type   | Description                                                                                                                                            | Default         |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend`  | str    | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged                     | `''`            |\n| `model`    | str    | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''`            |\n| `_cache`   | str    | The path of where to save the cached results. Passing `\"\"` disables the cache                                                                          | `'_mall_cache'` |\n| `**kwargs` |        | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama`                                                         | `{}`            |\n\n",
+    "supporting": [
+      "MallFrame_files"
+    ],
+    "filters": [],
+    "includes": {
+      "include-in-header": [
+        "<script src=\"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js\" integrity=\"sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==\" crossorigin=\"anonymous\"></script>\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js\" integrity=\"sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==\" crossorigin=\"anonymous\" data-relocate-top=\"true\"></script>\n<script type=\"application/javascript\">define('jquery', [],function() {return window.jQuery;})</script>\n"
+      ]
+    }
+  }
+}
diff --git a/_freeze/reference/llm_classify/execute-results/html.json b/_freeze/reference/llm_classify/execute-results/html.json
@@ -2,7 +2,7 @@
   "hash": "2654553ad72a6ca1b62748f913913568",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n  eval: true\n  freeze: true\n---\n\n\n\n\n\n\n[R/llm-classify.R](https://github.com/edgararuiz/mall/blob/main/R/llm-classify.R)\n\n## llm_classify\n\n## Description\n Use a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument. \n\n\n## Usage\n```r\n \nllm_classify( \n  .data, \n  col, \n  labels, \n  pred_name = \".classify\", \n  additional_prompt = \"\" \n) \n \nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_classify(reviews, review, c(\"appliance\", \"computer\")) \n#> # A tibble: 3 × 2\n#>   review                                        .classify\n#>   <chr>                                         <chr>    \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Use 'pred_name' to customize the new column's name \nllm_classify( \n  reviews, \n  review, \n  c(\"appliance\", \"computer\"), \n  pred_name = \"prod_type\" \n) \n#> # A tibble: 3 × 2\n#>   review                                        prod_type\n#>   <chr>                                         <chr>    \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Pass custom values for each classification \nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2)) \n#> # A tibble: 3 × 2\n#>   review                                                               .classify\n#>   <chr>                                                                    <dbl>\n#> 1 This has been the best TV I've ever used. Great screen, and sound.           1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too…         2\n#> 3 Not sure how to feel about my new washing machine. Great color, but…         1\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_classify( \n  c(\"this is important!\", \"just whenever\"), \n  c(\"urgent\", \"not urgent\") \n) \n#> [1] \"urgent\" \"urgent\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_classify( \n  c(\"this is important!\", \"just whenever\"), \n  c(\"urgent\", \"not urgent\"), \n  preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations.  The answer is based on the following text:\\nthis is important!\")), \n#>     output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n",
+    "markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n  eval: true\n  freeze: true\n---\n\n\n\n\n\n[R/llm-classify.R](https://github.com/edgararuiz/mall/blob/main/R/llm-classify.R)\n\n## llm_classify\n\n## Description\n Use a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument. \n\n\n## Usage\n```r\n \nllm_classify( \n  .data, \n  col, \n  labels, \n  pred_name = \".classify\", \n  additional_prompt = \"\" \n) \n \nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_classify(reviews, review, c(\"appliance\", \"computer\")) \n#> # A tibble: 3 × 2\n#>   review                                        .classify\n#>   <chr>                                         <chr>    \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Use 'pred_name' to customize the new column's name \nllm_classify( \n  reviews, \n  review, \n  c(\"appliance\", \"computer\"), \n  pred_name = \"prod_type\" \n) \n#> # A tibble: 3 × 2\n#>   review                                        prod_type\n#>   <chr>                                         <chr>    \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Pass custom values for each classification \nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2)) \n#> # A tibble: 3 × 2\n#>   review                                                               .classify\n#>   <chr>                                                                    <dbl>\n#> 1 This has been the best TV I've ever used. Great screen, and sound.           1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too…         2\n#> 3 Not sure how to feel about my new washing machine. Great color, but…         1\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_classify( \n  c(\"this is important!\", \"just whenever\"), \n  c(\"urgent\", \"not urgent\") \n) \n#> [1] \"urgent\" \"urgent\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_classify( \n  c(\"this is important!\", \"just whenever\"), \n  c(\"urgent\", \"not urgent\"), \n  preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations.  The answer is based on the following text:\\nthis is important!\")), \n#>     output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"