Skip to content

Commit

Permalink
Merge pull request #20 from edgararuiz/updates
Browse files Browse the repository at this point in the history
Updates
  • Loading branch information
edgararuiz authored Oct 9, 2024
2 parents 606ded8 + 94c381b commit bb79109
Show file tree
Hide file tree
Showing 22 changed files with 378 additions and 88 deletions.
4 changes: 2 additions & 2 deletions _freeze/index/execute-results/html.json

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions _freeze/reference/MallFrame/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"hash": "5899b6a791e9901601e683a4446adf9a",
"result": {
"engine": "jupyter",
"markdown": "---\ntitle: MallFrame\n---\n\n\n\n`MallFrame(self, df)`\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarise the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n\n### classify { #mall.MallFrame.classify }\n\n`MallFrame.classify(col, labels='', additional='', pred_name='classify')`\n\nClassify text into specific categories.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n### custom { #mall.MallFrame.custom }\n\n`MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|-------------|--------|----------------------------------------------------------------------------------------|------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `prompt` | str | The prompt to send to the LLM along with the `col` | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n### extract { #mall.MallFrame.extract }\n\n`MallFrame.extract(col, labels='', additional='', pred_name='extract')`\n\nPull a specific label from the text.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `labels` | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n### sentiment { #mall.MallFrame.sentiment }\n\n`MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')`\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `options` | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples\n\n\n::: {#d67b08f2 .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100), _cache = \"_readme_cache\")\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n<div><style>\n.dataframe > thead > tr,\n.dataframe > tbody > tr {\n text-align: right;\n white-space: pre-wrap;\n}\n</style>\n<small>shape: (3, 2)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>review</th><th>sentiment</th></tr><tr><td>str</td><td>str</td></tr></thead><tbody><tr><td>&quot;This has been the best TV I&#x27;ve…</td><td>&quot;positive&quot;</td></tr><tr><td>&quot;I regret buying this laptop. I…</td><td>&quot;negative&quot;</td></tr><tr><td>&quot;Not sure how to feel about my …</td><td>&quot;neutral&quot;</td></tr></tbody></table></div>\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n`MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')`\n\nSummarise the text down to a specific number of words.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `max_words` | int | Maximum number of words to use for the summary | `10` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n### translate { #mall.MallFrame.translate }\n\n`MallFrame.translate(col, language='', additional='', pred_name='translation')`\n\nTranslate text into another language.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|--------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| `col` | str | The name of the text field to process | _required_ |\n| `language` | str | The target language to translate to. For example 'French'. | `''` |\n| `pred_name` | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| `additional` | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n### use { #mall.MallFrame.use }\n\n`MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)`\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters\n\n| Name | Type | Description | Default |\n|------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| `backend` | str | The name of the backend to use. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| `model` | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| `_cache` | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| `**kwargs` | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n",
"supporting": [
"MallFrame_files"
],
"filters": [],
"includes": {
"include-in-header": [
"<script src=\"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js\" integrity=\"sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==\" crossorigin=\"anonymous\"></script>\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js\" integrity=\"sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==\" crossorigin=\"anonymous\" data-relocate-top=\"true\"></script>\n<script type=\"application/javascript\">define('jquery', [],function() {return window.jQuery;})</script>\n"
]
}
}
}
2 changes: 1 addition & 1 deletion _freeze/reference/llm_classify/execute-results/html.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"hash": "2654553ad72a6ca1b62748f913913568",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n[R/llm-classify.R](https://github.com/edgararuiz/mall/blob/main/R/llm-classify.R)\n\n## llm_classify\n\n## Description\n Use a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument. \n\n\n## Usage\n```r\n \nllm_classify( \n .data, \n col, \n labels, \n pred_name = \".classify\", \n additional_prompt = \"\" \n) \n \nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_classify(reviews, review, c(\"appliance\", \"computer\")) \n#> # A tibble: 3 × 2\n#> review .classify\n#> <chr> <chr> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Use 'pred_name' to customize the new column's name \nllm_classify( \n reviews, \n review, \n c(\"appliance\", \"computer\"), \n pred_name = \"prod_type\" \n) \n#> # A tibble: 3 × 2\n#> review prod_type\n#> <chr> <chr> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Pass custom values for each classification \nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2)) \n#> # A tibble: 3 × 2\n#> review .classify\n#> <chr> <dbl>\n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too… 2\n#> 3 Not sure how to feel about my new washing machine. Great color, but… 1\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\") \n) \n#> [1] \"urgent\" \"urgent\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\"), \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations. The answer is based on the following text:\\nthis is important!\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n",
"markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n[R/llm-classify.R](https://github.com/edgararuiz/mall/blob/main/R/llm-classify.R)\n\n## llm_classify\n\n## Description\n Use a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument. \n\n\n## Usage\n```r\n \nllm_classify( \n .data, \n col, \n labels, \n pred_name = \".classify\", \n additional_prompt = \"\" \n) \n \nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE) \n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n `llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`. \n\n\n## Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\n \nlibrary(mall) \n \ndata(\"reviews\") \n \nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE) \n \nllm_classify(reviews, review, c(\"appliance\", \"computer\")) \n#> # A tibble: 3 × 2\n#> review .classify\n#> <chr> <chr> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Use 'pred_name' to customize the new column's name \nllm_classify( \n reviews, \n review, \n c(\"appliance\", \"computer\"), \n pred_name = \"prod_type\" \n) \n#> # A tibble: 3 × 2\n#> review prod_type\n#> <chr> <chr> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n \n# Pass custom values for each classification \nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2)) \n#> # A tibble: 3 × 2\n#> review .classify\n#> <chr> <dbl>\n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too… 2\n#> 3 Not sure how to feel about my new washing machine. Great color, but… 1\n \n# For character vectors, instead of a data frame, use this function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\") \n) \n#> [1] \"urgent\" \"urgent\"\n \n# To preview the first call that will be made to the downstream R function \nllm_vec_classify( \n c(\"this is important!\", \"just whenever\"), \n c(\"urgent\", \"not urgent\"), \n preview = TRUE \n) \n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations. The answer is based on the following text:\\nthis is important!\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
Expand Down
Loading

0 comments on commit bb79109

Please sign in to comment.