Skip to content

Find R packages matching either descriptions or other R packages

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ropensci-review-tools/pkgmatch

Repository files navigation

R build status codecov Project Status: WIP

pkgmatch

A tool to help find R packages, by matching packages either to a text description, or to any given package. Can find matching packages either from rOpenSci’s suite of packages, or from all packages currently on CRAN.

Installation

This package relies on a locally-running instance of ollama. Procedures for setting that up are described in a separate vignette. ollama needs to be installed before this package can be used.

Once ollama is running, the easiest way to install this package is via the associated r-universe. As shown there, simply enable the universe with

options (repos = c (
    ropenscireviewtools = "https://ropensci-review-tools.r-universe.dev",
    CRAN = "https://cloud.r-project.org"
))

And then install the usual way with,

install.packages ("pkgmatch")

Alternatively, the package can be installed by first installing either the remotes or pak packages and running one of the following lines:

remotes::install_github ("ropensci-review-tools/pkgmatch")
pak::pkg_install ("ropensci-review-tools/pkgmatch")

The package can then loaded for use with

library (pkgmatch)

The package takes input either from a text description or local path to an R package, and finds similar packages based on both Large Language Model (LLM) embeddings, and more traditional text and code matching algorithms. The LLM embeddings require a locally-running instance of ollama, as described in a separate vignette.

Using the pkgmatch package

The package has two main functions:

  • pkgmatch_similar_pkgs() to find similar rOpenSci or CRAN packages based input as either a local path to an entire package, or as a single descriptive text string; and
  • pkgmatch_similar_fns() to find similar functions from rOpenSci packages based on descriptive text input. (Not available for functions from CRAN packages.)

The following code demonstrates how these functions work, first matching general text strings packages from rOpenSci:

input <- "
Packages for analysing evolutionary trees, with a particular focus
on visualising inter-relationships among distinct trees.
"
pkgmatch_similar_pkgs (input)
## [1] "lingtypology"   "treedata.table" "treestartr"     "babette"       
## [5] "canaper"

Corresponding websites can also be automatically opened, either by passing browse = TRUE, or by specifying a return value and passing that to the pkgmatch_browse() function.

Matching entire packages

The input parameter can also be a local path to an entire package. The following code finds the most similar packages to this very package by passing input = ".", again by default matching against all rOpenSci packages:

pkgmatch_similar_pkgs (".")
## $text
## [1] "pkgcheck"       "rdataretriever" "elastic"        "codemetar"     
## [5] "robotstxt"     
## 
## $code
## [1] "autotest"    "pkgcheck"    "roreviewapi" "dynamite"    "cffr"

And the most similar packages in terms of text descriptions include several general search and retrieval packages, and only the pkgcheck package from the ropensci-review-tools suite. In contrast, four of the five most similar packages in terms of code structure are packages from the same ropensci-review-tools suite. Packages from CRAN can be matched by specifying the corpus parameter:

pkgmatch_similar_pkgs (".", corpus = "cran")
## $text
## [1] "librarian" "ore"       "ehelp"     "searcher"  "RWsearch" 
## 
## $code
## [1] "workflowr" "RInno"     "remotes"   "pkgload"   "miniCRAN"

The input parameter can also be a local path to compressed .tar.gz binary object directly downloaded from CRAN.

Finding functions

There is an additional function to find functions within packages which best match a text description.

input <- "A function to label a set of geographic coordinates"
pkgmatch_similar_fns (input)
## [1] "GSODR::nearest_stations"          "refsplitr::plot_addresses_points"
## [3] "slopes::elevation_extract"        "quadkeyr::grid_to_polygon"       
## [5] "rnoaa::meteo_nearby_stations"
input <- "Identify genetic sequences matching a given input fragment"
pkgmatch_similar_fns (input)
## [1] "charlatan::SequenceProvider" "beastier::is_alignment"     
## [3] "charlatan::ch_gene_sequence" "beautier::is_phylo"         
## [5] "textreuse::align_local"

Setting browse = TRUE will then open the documentation pages corresponding to those best-matching functions.

Prior Art

About

Find R packages matching either descriptions or other R packages

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages