Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiocCheck naive function line count penalizes good coding practices of commenting code and readable multiline code #220

Open
omsai opened this issue Jan 7, 2025 · 3 comments

Comments

@omsai
Copy link

omsai commented Jan 7, 2025

BiocCheck currently shows a NOTE for functions longer than 50 lines. However comment-only lines contribute to this limit, as do breaking up arguments of function calls with multiple lines to make them more legible. Instead of counting lines, BiocCheck should inspect code statements. For example, cloc is CLI utility that counts code statements for many languages https://github.com/AlDanial/cloc (there actually is an R packge for cloc https://github.com/hrbrmstr/cloc/ but it's not currently on CRAN). BiocCheck already imports the codetools package that I think should be able to count closures with an appropriate code walker.

library(codetools)

to_check <- quote(
  my_fun <- function(x) {
    ## A comment
    ## spanning
    ## multiple lines
    x |>
      c("a statement") |>
      c("with pipes")
    message(
      "A function call",
      "with arguments",
      "broken up across",
      "multiple lines")
  }
)

showTree(to_check)
##> (<- my_fun (function as.pairlist(alist(x = )) ("{" (c (c x "a statement") "with pipes") (message "A function call" "with arguments" "broken up across" "multiple lines")) structure(c(2L, 13L, 14L, 3L, 13L, 3L, 2L, 14L), srcfile = <environment>, class = "srcref")))
@LiNk-NY
Copy link
Contributor

LiNk-NY commented Jan 7, 2025

Hi Pariksheet, @omsai

I am not sure I understand your argument.
AFAICT, cloc does not count the number of lines per function.
It does so for each file and is able to discern the programming language.
Can you provide a reproducible example where it is an improvement over the current implementation?

One can argue that adding comments within the function are a stylistic choice.
They can also go outside the function. A filter mechanism can be added to getFunctionLengths but I think the whole function needs a re-write.

Best regards,
Marcel

@omsai
Copy link
Author

omsai commented Jan 8, 2025

That's right, cloc does not count statements per function; it counts statements per file. But my reason for point to it is it counts code statements, not code lines.

Can you provide a reproducible example where it is an improvement over the current implementation?

This gets most of the way there where it counts the number of statements but one improvement would be recursively looking inside closures (e.g. if / else/ for / while, etc.)

# getFunctionLengths ------------------------------------------------------
file <- system.file("testpackages", "testpkg0", "R",
    "parseme.R", package="BiocCheck")

## Read the file a list of language expressions.
exprs <- parse(file)
lapply(exprs, function(x) x)

## Walker to organize tokens into a character vector of closures.
walker <-
    codetools::makeCodeWalker(
        call = function(e, w) {
            result <<- c(result, "(")
            codetools::walkCode(e[[1]], w)
            for (subexpr in as.list(e[-1])) {
                if (missing(subexpr)) {
                    result <<- c(result, "<Missing>")
                } else {
                    codetools::walkCode(subexpr, w)
                }
            }
            result <<- c(result, ")")
        },
        leaf = function(e, w) {
            if (typeof(e) == "symbol") {
                if (e == "(") {
                    result <<- c(result, "\\(")
                } else {
                    result <<- c(result, deparse(e))
                }
            }
        })
## Gather tokens into closure.
results <- character(length(exprs))
for (i in seq_along(exprs)) {
    result <- character()
    codetools::walkCode(exprs[[i]], walker)
    results[i] <- paste(result, collapse = " ")
}
results

## Process the extracted character vector of closures.
library(tidyverse)

closure_pop_outer <- function(closure) {
    if (str_count(closure, "[(]") == 0L) {
        return("")
    }
    str_extract(closure, "^[(][^(]+(.*)[)]$", group = 1L) |>
        str_trim()
}

closure_name <- function(closure) {
    if (str_count(closure, "[(]") == 0L) {
        return("")
    }
    str_split(closure, " ", n = 3, simplify = TRUE)[, 2]
}

#' We want to remove closures leading up to the first function, and,
#' optionally, the first parenthesis closure found right after the
#' function.
#'
#' @param closure An atomic character vector.
#' @return An atomic character.
strip_function <- function(closure) {
    while (closure != "" &&
               closure_name(closure) != "function") {
        closure <- closure_pop_outer(closure)
    }
    if (closure != "" &&
            closure_name(closure) == "function") {
        closure <- closure_pop_outer(closure)
        if (closure != "" &&
                closure_name(closure) == "{") {
            closure <- closure_pop_outer(closure)
        }
    }
    closure
}

closure_pop_sequential <- function(closure) {
    if (str_count(closure, "[(]") == 0L) {
        return("")
    }
    str_extract(closure, "^[(][^(]+[)](.*)", group = 1L) |>
        str_trim()
}

closure_count_sequential <- function(closure) {
    count <- 0L
    while (closure != "") {
        count <- count + 1L
        closure <- closure_pop_sequential(closure)
    }
    count
}

## Convert to data.frame for convenience.
df <- tibble(orig = results,
             statements = map_chr(orig, strip_function),
             n_statements = map_int(statements, closure_count_sequential))
df
# A tibble: 9 × 3
  orig                                              statements    n_statements
  <chr>                                             <chr>                <int>
1 ( function ( { ) )                                ""                       0
2 ( = fa ( function ) )                             ""                       0
3 ( <- f2 ( function ) )                            ""                       0
4 ( = f3 ( function ( { ) ) )                       ""                       0
5 ( <- f4 ( function ( { ) ) )                      ""                       0
6 ( lapply LETTERS ( function ( { ( print x ) ) ) ) "( print x )"            1
7 ( <- f5 ( function ( { ) ) )                      ""                       0
8 ( <- f6 ( function ( { ) ) )                      ""                       0
9 ( <- f7 ( function ( { ) ) )                      ""                       0

@hpages
Copy link
Contributor

hpages commented Jan 22, 2025

My unsollicited 2 cents on this: it makes sense to ignore comment-only lines and hopefully that is not too hard to implement. However I would not recommend trying to go to far or spend too much time on an overly complicated counting approach. Maybe counting non comment-only lines is good enough? It's just a NOTE after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants