Skip to content

Commit

Permalink
Adds badges, and code coveage
Browse files Browse the repository at this point in the history
  • Loading branch information
edgararuiz committed Sep 12, 2024
1 parent 493525b commit f6aa8ac
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 100 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ utils/
^docs$
^pkgdown$
^\.github$
^codecov\.yml$
5 changes: 4 additions & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ knitr::opts_chunk$set(
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
eval = TRUE
eval = FALSE
)
library(dplyr)
library(dbplyr)
Expand All @@ -23,6 +23,9 @@ mall::llm_use("ollama", "llama3.1", seed = 100)
# mall

<!-- badges: start -->
[![Codecov test coverage](https://codecov.io/gh/edgararuiz/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/edgararuiz/mall?branch=main)
[![R-CMD-check](https://github.com/edgararuiz/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/edgararuiz/mall/actions/workflows/R-CMD-check.yaml)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->

```{r, eval = FALSE, echo = FALSE}
Expand Down
106 changes: 7 additions & 99 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@
# mall

<!-- badges: start -->

[![Codecov test
coverage](https://codecov.io/gh/edgararuiz/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/edgararuiz/mall?branch=main)
[![R-CMD-check](https://github.com/edgararuiz/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/edgararuiz/mall/actions/workflows/R-CMD-check.yaml)
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->

<!-- toc: start -->

- [Motivation](#motivation)
Expand Down Expand Up @@ -84,25 +91,13 @@ library(mall)

reviews |>
llm_sentiment(review)
#> # A tibble: 3 × 2
#> review .sentiment
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… positive
#> 2 I regret buying this laptop. It is too … negative
#> 3 Not sure how to feel about my new washi… neutral
```

The function let’s us modify the options to choose from:

``` r
reviews |>
llm_sentiment(review, options = c("positive", "negative"))
#> # A tibble: 3 × 2
#> review .sentiment
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… positive
#> 2 I regret buying this laptop. It is too … negative
#> 3 Not sure how to feel about my new washi… negative
```

As mentioned before, by being pipe friendly, the results from the LLM
Expand All @@ -112,11 +107,6 @@ prediction can be used in further transformations:
reviews |>
llm_sentiment(review, options = c("positive", "negative")) |>
filter(.sentiment == "negative")
#> # A tibble: 2 × 2
#> review .sentiment
#> <chr> <chr>
#> 1 I regret buying this laptop. It is too … negative
#> 2 Not sure how to feel about my new washi… negative
```

### Summarize
Expand All @@ -129,12 +119,6 @@ number of words to output (`max_words`):
``` r
reviews |>
llm_summarize(review, max_words = 5)
#> # A tibble: 3 × 2
#> review .summary
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… very good tv experience overall
#> 2 I regret buying this laptop. It is too … slow and noisy laptop purchase
#> 3 Not sure how to feel about my new washi… mixed feelings about new washer
```

To control the name of the prediction field, you can change `pred_name`
Expand All @@ -143,12 +127,6 @@ argument. This works with the other `llm_` functions as well.
``` r
reviews |>
llm_summarize(review, max_words = 5, pred_name = "review_summary")
#> # A tibble: 3 × 2
#> review review_summary
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… very good tv experience overall
#> 2 I regret buying this laptop. It is too … slow and noisy laptop purchase
#> 3 Not sure how to feel about my new washi… mixed feelings about new washer
```

### Classify
Expand All @@ -158,12 +136,6 @@ Use the LLM to categorize the text into one of the options you provide:
``` r
reviews |>
llm_classify(review, c("appliance", "computer"))
#> # A tibble: 3 × 2
#> review .classify
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… appliance
#> 2 I regret buying this laptop. It is too … computer
#> 3 Not sure how to feel about my new washi… appliance
```

### Extract
Expand All @@ -177,12 +149,6 @@ We do this by simply saying “product”. The LLM understands what we
``` r
reviews |>
llm_extract(review, "product")
#> # A tibble: 3 × 2
#> review .extract
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… tv
#> 2 I regret buying this laptop. It is too … laptop
#> 3 Not sure how to feel about my new washi… washing machine
```

### Translate
Expand All @@ -195,12 +161,6 @@ to be defined. The translation accuracy will depend on the LLM
``` r
reviews |>
llm_translate(review, "spanish")
#> # A tibble: 3 × 2
#> review .translation
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… Este ha sido el mejor televisor que …
#> 2 I regret buying this laptop. It is too … Lamento haber comprado esta laptop. …
#> 3 Not sure how to feel about my new washi… No estoy seguro de cómo sentirme sob…
```

### Custom prompt
Expand All @@ -219,12 +179,6 @@ my_prompt <- paste(

reviews |>
llm_custom(review, my_prompt)
#> # A tibble: 3 × 2
#> review .pred
#> <chr> <chr>
#> 1 This has been the best TV I've ever use… Yes
#> 2 I regret buying this laptop. It is too … No
#> 3 Not sure how to feel about my new washi… No
```

## Initialize session
Expand All @@ -243,8 +197,6 @@ Ollama, that function is

``` r
llm_use("ollama", "llama3.1", seed = 100, temperature = 0.2)
#> Provider: ollama
#> Model: llama3.1
```

## Key considerations
Expand Down Expand Up @@ -290,18 +242,13 @@ book_reviews <- data_bookReviews |>
as_tibble()

glimpse(book_reviews)
#> Rows: 100
#> Columns: 2
#> $ review <chr> "i got this as both a book and an audio file. i had waited t…
#> $ sentiment <fct> 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 1, …
```

As per the docs, `sentiment` is a factor indicating the sentiment of the
review: negative (1) or positive (2)

``` r
length(strsplit(paste(book_reviews, collapse = " "), " ")[[1]])
#> [1] 20571
```

Just to get an idea of how much data we’re processing, I’m using a very,
Expand All @@ -317,12 +264,7 @@ reviews_llm <- book_reviews |>
options = c("positive", "negative"),
pred_name = "predicted"
)
#> ! There were 1 predictions with invalid output, they were coerced to NA
```

``` r
toc()
#> 169.546 sec elapsed
```

As far as **time**, on my Apple M3 machine, it took about 3 minutes to
Expand All @@ -345,20 +287,6 @@ This is what the new table looks like:

``` r
reviews_llm
#> # A tibble: 100 × 3
#> review sentiment predicted
#> <chr> <fct> <chr>
#> 1 "i got this as both a book and an audio file. i had wait… 1 negative
#> 2 "this book places too much emphasis on spending money in… 1 negative
#> 3 "remember the hollywood blacklist? the hollywood ten? i'… 2 negative
#> 4 "while i appreciate what tipler was attempting to accomp… 1 negative
#> 5 "the others in the series were great, and i really looke… 1 negative
#> 6 "a few good things, but she's lost her edge and i find i… 1 negative
#> 7 "words cannot describe how ripped off and disappointed i… 1 negative
#> 8 "1. the persective of most writers is shaped by their ow… 1 negative
#> 9 "i have been a huge fan of michael crichton for about 25… 1 negative
#> 10 "i saw dr. polk on c-span a month or two ago. he was add… 2 positive
#> # ℹ 90 more rows
```

I used `yardstick` to see how well the model performed. Of course, the
Expand All @@ -371,10 +299,6 @@ library(forcats)
reviews_llm |>
mutate(fct_pred = as.factor(ifelse(predicted == "positive", 2, 1))) |>
yardstick::accuracy(sentiment, fct_pred)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.939
```

## Vector functions
Expand All @@ -386,12 +310,10 @@ corresponding `llm_vec_` function:

``` r
llm_vec_sentiment("I am happy")
#> [1] "positive"
```

``` r
llm_vec_translate("Este es el mejor dia!", "english")
#> [1] "This is the best day!"
```

## Databricks
Expand All @@ -416,13 +338,6 @@ vendor’s SQL AI function directly:
``` r
tbl_reviews |>
llm_sentiment(review)
#> # Source: SQL [3 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review .sentiment
#> <chr> <chr>
#> 1 This has been the best TV Ive ever used. Great screen, and sound. positive
#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative
#> 3 Not sure how to feel about my new washing machine. Great color, bu… mixed
```

There are some differences in the arguments, and output of the LLM’s.
Expand All @@ -435,11 +350,4 @@ the same argument in the AI Summarize function:
``` r
tbl_reviews |>
llm_summarize(review, max_words = 5)
#> # Source: SQL [3 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review .summary
#> <chr> <chr>
#> 1 This has been the best TV Ive ever used. Great screen, and sound. Superio…
#> 2 I regret buying this laptop. It is too slow and the keyboard is too … Slow, n…
#> 3 Not sure how to feel about my new washing machine. Great color, but … Initial…
```
14 changes: 14 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
comment: false

coverage:
status:
project:
default:
target: auto
threshold: 1%
informational: true
patch:
default:
target: auto
threshold: 1%
informational: true

0 comments on commit f6aa8ac

Please sign in to comment.