-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counts of external calls may be inaccurate #61
Comments
@pawelru First of all please note you are referring throughout to open-source software maintained willingly by nice people trying to do nice things. Asserting in an issue title that a package or function "is inaccurate*, and then claiming as your opening sentence,
is neither nice nor respectful. I am tempted to close this issue for this reason alone, but fortunately for you am intrigued by what your report, and think it is worth further consideration here. In future, please do try to be more respectful towards voluntary open-source maintainers. Some points at the outset:
On to your comparisons:
|
Thanks @mpadge for a prompt and very detailed response - I really appreciate that. Especially that you actually went beyond into the referenced codebase and explain individual examples. Very much appreciated. Let me begin by saying that I never intended to offend anyone in my previous message. My aim was to simply share some feedback on things that I found concerning. I believe that the constructive feedback is essential, especially in the context of open-sourcing. By this, I mean reports, not criticism. I am sorry if my words were misunderstood. I totally agree with you that this is not a trivial problem. My background is that I am looking for a measures of a dependency of a package. My first idea was to measure the usage of the objects and I am currently experimenting with different tools. Your message makes me realise that this ( |
No worries @pawelru, and thank you for your considered response here. Good to hear that you generally understand what I was trying to set out in the detailed response above. The occasional missing function calls from |
@pawelru I've implemented tree-sitter parsing of external calls in #62, which currently gives these results for your package: path <- "/<local>/<path>/<to>/teal"
has_tabs <- FALSE
pkg_name <- basename (path)
# The current pkgstats approach with 'ctags':
tags_r <- withr::with_dir (path, get_ctags ("R", has_tabs))
ex <- pkgstats:::external_call_network (tags_r, path, pkg_name)
ex <- ex [which (!ex$package == "base"), ]
nrow (ex)
#> [1] 495
# New tree-sitter parsing from code in #62:
tags_treesitter <- get_treesitter_tags (path)
ex <- pkgstats:::external_call_network (tags_treesitter, path, pkg_name)
ex <- ex [which (!ex$package == "base"), ]
nrow (ex)
#> [1] 395
ex <- data.frame (checkglobals::check_pkg (path))
ex <- ex [which (!ex$package == "base"), ]
nrow (ex)
#> [1] 159 Created on 2024-09-13 with reprex v2.1.1 That suggests current approach here is actually okay. The tree-sitter values may be presumed to provide a "true" reference. I'll investigate further over in #62 |
Thank you for follow ups on this and apologies for a delayed answer. I was on a short break and right afterwards had to focus on something different. Now I am back into this and it looking very very promising. Thanks again. Please find below a few examples that I was able to check quickly. I'm using the updated local copy of a referenced repo so the overall numbers might be slightly different to yours but the difference is rather minimal (e.g. I have 511 and you got 495). I'm yet to check it against checkglobals.
These are just top diffs that I checked. Even though the sum is comparable - there are actually some differences - both positive and negative. Now I don't want you to go into the details - that's not the point of my message. What I just wanted to say here is that based on that brief analysis, the treesitter approach looks very very promising! |
Thanks @pawelru for pursuing this further. I've implemented a full treesitter function tagger in currently experiment package, pkgsimil::pkgsimil_tag_fns(path) Just note that things are likely to change quickly in that package, including function names and potentially name of package. But the underlying algorithm works really well, and identifies every single function call. That will eventually find it's way over here, but doing so will require rebuilding our entire CRAN database to update function call data. So that will have to wait a while ... |
I have found the calculation of external calls quite inaccurate. Please find below a few examples.
Reading the source code I am aware that it's probably an imperfection of external dependent
ctags
. It might do well other things (like accounting internal vars) but for counting dependent packages I have found it not detecting occurrences which results in undercounting totals or not including some at all.It's difficult to ask you to fix it and I'm just sharing my findings with you. Maybe you can consider alternative ways of getting what's needed? I'm aware of relatively new
treesitter
but I personally have no experience in it. Maybecheckglobals
show here?Checking a randomly selected
teal
repo that I'm working on:pkgstats
gives me this:(where
x
is defined as per the vignette)checkglobals
(I'm not an author) reports me this:I'm aware of some differences, e.g. you include
R/
dir only whilecheckglobals
look into all dirs (incl. tests, vignettes etc.) but nevertheless the latter seems to be more accurate to me. A few examples using imperfect regexp on my local clone just to show you why I think that the values are incorrect:teal.reporter
dependencycheckglobals
reports 4 uniquepkgstats
reports 1 uniqueR/
pkgstats
reports just one unique occurencerenv
checkglobals
reports 1 uniquepkgstats
reports 0R/
shinyjs
checkglobals
reports 8 uniquepkgstats
reports 2 uniqueR/
The text was updated successfully, but these errors were encountered: