Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run crate-ci/typos in CI #51704

Merged
merged 45 commits into from
Oct 24, 2023
Merged

Run crate-ci/typos in CI #51704

merged 45 commits into from
Oct 24, 2023

Conversation

LilithHafner
Copy link
Member

@LilithHafner LilithHafner commented Oct 14, 2023

https://github.com/crate-ci/typos

TODO:

  • In the actions/checkout action, set persist-credentials: false.
  • Set a short timeout (e.g. 2 minutes or 5 minutes).
  • Fix true spelling errors (Fix typos [nfc] #51709)
    - [ ] Create an ignore list with the false positives?
List of false positives that could be stored in typos.toml, if folks decide to go that route down the line
# Hello and Welcome to typos.toml!

## Policy
# tldr: Don't let CI bully you. Use whatever varaible names you want.
#
# The spell check CI feature exists to help detect accidental typos, not to constrain the
# language we use in this repo. If you want to use a word or a variable name or somesuch
# that the CI check is not happy with, go right ahead, just make sure to add the appropriate
# entry to this file to tell the CI check that you didn't make a mistake. As a reviewier, if
# you think that a variable name is bad, go right ahead and tell the author that, just don't
# cite the CI check as a reason, that's not what the check is for.

## How to
# tldr: Add the whole identifier right below [default.extend-identifiers]
#
# There are two magor sections in this file, [default.extend-identifiers] (near the top) and
# [default.extend-words] (near the bottom). If the false positive is a full identifier like
# `mrs_claus` or `breal`, add it to [default.extend-identifiers] and that specific
# identifier will no longer be flagged as a typo. If, on the other hand, it is a word that
# appeasrs inside multiple a multi-word identifiers, like `ser` which is used in
# `jl_lookup_ser_tag`, `ser_tag`, `ser_version`, etc. then insrted of adding each of those
# identifiers individually, simply add it to the [default.extend-words] section. Identifiers
# are case sensitive and words are not.

[default.extend-identifiers]
mrs_claus = "mrs_claus"
womens = "womens"
breal = "breal"
Breal = "Breal"
Numer = "Numer"
splitted = "splitted"
affinitized = "affinitized"
key_smove = "key_smove"
readed_zero = "readed_zero"
check_inconsistentcy = "check_inconsistentcy"
L_OP_CALLL = "L_OP_CALLL"
OP_CALLL = "OP_CALLL"

# strings in test directories
Facilisi = "Facilisi"
vailable = "vailable"

# typos that are programmatically observable
supress_output = "supress_output"
hashs_seed = "hashs_seed"

[default.extend-words]
ba = "ba"
egal = "egal"
parm = "parm"
modul = "modul"
nd = "nd"
strat = "strat"
WRONLY = "WRONLY"
thisy = "thisy"
vally = "vally"
Ot = "Ot"
clos = "clos"
applys = "applys"
findn = "findn"
mis = "mis"
seh = "seh"
ue = "ue"
Missings = "Missings"
SOM = "SOM"
sais = "sais"
Merly = "Merly"
Filetimes = "Filetimes"
HSA = "HSA"
Strategems = "Strategems"
anumber = "anumber"
uupper = "uupper"
shttp = "shttp"
ser = "ser"
somes = "somes"
sav = "sav"
egals = "egals"
OLT = "OLT"
eyt = "eyt"
matc = "matc"
nam = "nam"
alls = "alls"
noe = "noe"

# strings in tests directories
fo = "fo"
Uest = "Uest"
Dows = "Dows"
aquire = "aquire"

# typos that are programmatically observable
compileable = "compileable"
overlayed = "overlayed"

[default]
extend-ignore-identifiers-re = ["^[a-zA-Z][a-zA-Z]?[a-zA-Z]?[a-zA-Z]?$"]

@DilumAluthge
Copy link
Member

I've added crate-ci/typos@* to the GitHub Actions allowlist for this repo.

@LilithHafner
Copy link
Member Author

I was wondering what was going on :)

I noticed you asked for a short timeout in #48359, do you want that here too?

@DilumAluthge
Copy link
Member

I noticed you asked for a short timeout in #48359, do you want that here too?

Yeah, that would be good. Looks like this action runs pretty quickly, so a few minutes should be sufficient for a timeout.

@DilumAluthge DilumAluthge changed the title Try using typos in CI CI: Try using the crate-ci/typos GitHub Action in CI Oct 14, 2023
@DilumAluthge DilumAluthge added the ci Continuous integration label Oct 14, 2023
Copy link
Member

@DilumAluthge DilumAluthge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this seems like a good plan.

I'll do another thorough line-by-line review later.

We likely want to wait to merge this until the actual typos (the true positives) have been fixed. Lilith has a PR open for that.

@DilumAluthge DilumAluthge changed the title CI: Try using the crate-ci/typos GitHub Action in CI CI: Use the crate-ci/typos GitHub Action in CI Oct 14, 2023
@DilumAluthge DilumAluthge changed the title CI: Use the crate-ci/typos GitHub Action in CI CI: Run the crate-ci/typos GitHub Action on pull requests Oct 14, 2023
@LilithHafner LilithHafner mentioned this pull request Oct 14, 2023
typos.toml Outdated Show resolved Hide resolved
@epage
Copy link

epage commented Oct 17, 2023

If any of the false positives from abbreviations are from the standard library, I'd be interested in getting those baked into a julia-specific dictionary. Similar for highly common package names (e.g. flate2 in rust) or community api names (e.g. ser in rust). That can lead to a better experience for all Julia users.

@LilithHafner
Copy link
Member Author

Here's a list of false positives that seem worth including in a Julia-specific dictionary:

Alls # All is a keyword
alls 
Egal # egal is a term for `===`
egal 
Egals
egals
Missings # Missing is a type (noun that can be pluralized)
missings # missing is a value (noun that can be pluralized)
ND # nd stands for number of dimensions, which is somewhat common
Nd
nd
somes # Some is a type (noun that can be pluralized)
splitted # `split` is a Base function so splitted is its past tense
modul # module and using are keywords, these abbreviations are used in Base and may be used elsewhere. The key detail is nobody will accidentally write modul or usig when they mean module or using because that would be caught much earlier than a typo check.
usig
whos # A function from before 1.0 that has since been removed but folks may still refer to it?
shttp # is an actuall protocall, and markdown blocks can be fenced with it as a label. This is not Julia specific.

Almost all of the remaining are abbreviations that happen to look like typos or deliberate typos used in testing.

@epage
Copy link

epage commented Oct 17, 2023

Alls # All is a keyword
alls
Missings # Missing is a type (noun that can be pluralized)
missings # missing is a value (noun that can be pluralized)
somes # Some is a type (noun that can be pluralized)
splitted # split is a Base function so splitted is its past tense

Rust also has Some but I never see somes used. Is that commonly in code or is it more a thing in documentation, like

// Here are several Somes

In Rust, I'd expect to see that written as

// Here are several `Some`s

which negates the problem.

ND # nd stands for number of dimensions, which is somewhat common
Nd
nd

For Python, we specifically approved NDArray. I was looking in docs, trying to find how this is used in Julia but couldn't. Mind giving me some pointers?

modul # module and using are keywords, these abbreviations are used in Base and may be used elsewhere. The key detail is nobody will accidentally write modul or usig when they mean module or using because that would be caught much earlier than a typo check.
usig

I'm assuming this is the same problem as Rust when we need to refer to a crate (keyword), we fallback to krate and these are the fully idomatic ways of doing so.

@LilithHafner
Copy link
Member Author

LilithHafner commented Oct 17, 2023

somes comes up in exactly one code snippet:

# Test completion for a case when type inference returned `Union` of the same types
union_somes(a, b) = rand() < 0.5 ? Some(a) : Some(b)
let s = "union_somes(1, 1.0)."
    c, r, res = test_complete_context(s)
    @test res
    @test "value" in c
end

nd comes up much more often, and I think this is a representative example:

function _reverse(A::BitArray, d::Int)
    nd = ndims(A)
    1  d  nd || throw(ArgumentError("dimension $d is not 1 ≤ $d$nd"))

I wouldn't call modul and usig as quite as idiomatic as you are describing, but similar concept.

@LilithHafner
Copy link
Member Author

LilithHafner commented Oct 17, 2023

The design now (which I think may be the final design) is...

Look at each file that was edited and neither created nor deleted. Scan these all for typos longer than 4 characters and compute the union. This is our baseline of "false positives". Compile this list into a Python set.

Look at each file that was edited and not deleted. Scan these all for typos longer than 4 characters, skipping any typos that are in the set of false positives. For each remaining typo, provide a github annotation and, if the typo is longer than 6 characters (at least 94% true positivity rate for typos that long), mark the CI job as a failure but continue to report all remaining novel typos.

This achieves:

  • Close to the full sensitivity of the typos crate
  • Almost never erroneously fails CI
  • Few false positives reported (hopefully)
  • When there is a false positive, it is safe to simply ignore it and press merge, even if the typos CI check was failing.

Runtime: 6s total, 1s doing the actual checks, typos goes through only the files that have been changed (and goes through them twice) bash handles lists of length equal to the number of files edited, python runs in O(number of reported typos). 5 minute timeout.

@KristofferC
Copy link
Member

Looking at the suggestions in #51707, spell-checking actual running code seems like a bad idea. Running it on comments and other types of text seems useful.

@ViralBShah ViralBShah marked this pull request as draft October 17, 2023 13:24
@LilithHafner
Copy link
Member Author

If it were trivial to exclude running code, I would have done so. However, I do still think it is acceptable to spell check running code. Specifically, "overlayed" "compileable" and "supress" have all made it into running code. supress_output actually made it into public API. Further, false positive avoidance seems pretty good to me. None of the suggestions you were laughing at in #51707 would be flagged in this PR.

Usage of "overlayed" "compileable" and "supress"
test/namedtuple.jl
390:            @test eff.nonoverlayed
397:            @test eff.nonoverlayed

test/reflection.jl
1030:        @test !Core.Compiler.is_nonoverlayed(effects)

test/compiler/codegen.jl
799:@test Core.Compiler.get_compileable_sig(which(f48085, (Vararg{Any},)), Tuple{typeof(f48085), Vararg{Int}}, Core.svec()) === nothing
800:@test Core.Compiler.get_compileable_sig(which(f48085, (Vararg{Any},)), Tuple{typeof(f48085), Int, Vararg{Int}}, Core.svec()) === Tuple{typeof(f48085), Any, Vararg{Any}}

test/compiler/AbstractInterpreter.jl
23:@MethodTable OverlayedMT
24:CC.method_table(interp::MTOverlayInterp) = CC.OverlayMethodTable(CC.get_world_counter(interp), OverlayedMT)
35:@overlay OverlayedMT strangesin(x::Float64) = iszero(x) ? nothing : cos(x)
37:# inference should use the overlayed method table
45:# effect analysis should figure out that the overlayed method is used
48:end |> !Core.Compiler.is_nonoverlayed
51:end |> !Core.Compiler.is_nonoverlayed
62:        @test Base.infer_effects(callstrange_entry, (Any,); interp) |> !Core.Compiler.is_nonoverlayed
65:        @warn "`nonoverlayed` test for inference bailing out is skipped since the method match sort order is changed."
72:end |> Core.Compiler.is_nonoverlayed
75:end |> Core.Compiler.is_nonoverlayed
87:@overlay OverlayedMT overlay_match(::Int) = missing
109:# disable partial concrete evaluation when tainted by any overlayed call
141:# Should not concrete-eval overlayed methods in semi-concrete interpretation

stdlib/REPL/src/TerminalMenus/config.jl
130: - `supress_output::Bool=false`: Ignored legacy argument, pass `suppress_output` as a keyword argument to `request` instead.
144:                supress_output::Union{Nothing, Bool}=nothing,   # typo was documented, unfortunately
176:    supress_output isa Bool   && (CONFIG[:supress_output] = supress_output)
183:config(charset=:ascii, scroll=:nowrap, supress_output=false, ctrl_c_interrupt=true)

base/loading.jl
2306:        exit(125) # we define status = 125 means PrecompileableError

base/compiler/utilities.jl
157:function get_compileable_sig(method::Method, @nospecialize(atype), sparams::SimpleVector)
162:        mt, atype, sparams, method, #=int return_if_compileable=#1)
170:        mt, atype, sparams, method, #=int return_if_compileable=#0)
173:isa_compileable_sig(@nospecialize(atype), sparams::SimpleVector, method::Method) =
174:    !iszero(ccall(:jl_isa_compileable_sig, Int32, (Any, Any, Any), atype, sparams, method))

base/compiler/typeinfer.jl
349:        cache_the_tree = ci.inferred && (is_inlineable(ci) || isa_compileable_sig(linfo.specTypes, linfo.sparam_vals, def))

base/compiler/abstractinterpretation.jl
165:            csig = get_compileable_sig(method, sig, match.sparams)
454:        # ignore the `:nonoverlayed` property if `interp` doesn't use overlayed method table
456:        if !isoverlayed(method_table(interp))
457:            all_effects = Effects(all_effects; nonoverlayed=false)
848:            if is_nonoverlayed(interp) || is_nonoverlayed(effects)
851:            # disable concrete-evaluation if this function call is tainted by some overlayed
852:            # method since currently there is no easy way to execute overlayed methods
853:            add_remark!(interp, sv, "[constprop] Concrete eval disabled for overlayed methods")

base/compiler/ssair/irinterp.jl
18:        (is_nonoverlayed(interp) || is_nonoverlayed(effects)))

base/compiler/ssair/inlining.jl
812:function compileable_specialization(mi::MethodInstance, effects::Effects,
817:        new_atype = get_compileable_sig(method, atype, sparams)
839:function compileable_specialization(match::MethodMatch, effects::Effects,
842:    return compileable_specialization(mi, effects, et, info; compilesig_invokes)
894:        return compileable_specialization(mi, effects, et, info;
899:    src === nothing && return compileable_specialization(mi, effects, et, info;
1501:        return compileable_specialization(mi, result.effects, et, info;
1535:        return compileable_specialization(result.mi, result.effects, et, info;
1594:    case = compileable_specialization(match, Effects(), InliningEdgeTracker(state), info;

base/compiler/methodtable.jl
66:`overlayed` indicates if any of the matching methods comes from an overlayed method table.
115:        (match::Union{MethodMatch,Nothing}, valid_worlds::WorldRange, overlayed::Bool)
129:`overlayed` indicates if any of the matching methods comes from an overlayed method table.

base/compiler/effects.jl
45:- `nonoverlayed::Bool`: indicates that any methods that may be called within this method
46:  are not defined in an [overlayed method table](@ref OverlayMethodTable).
94:Additionally, if the `nonoverlayed` property is false, a red prime symbol (′) is displayed after the tuple.
104:    nonoverlayed::Bool
113:        nonoverlayed::Bool)
122:            nonoverlayed)
151:const EFFECTS_UNKNOWN  = Effects(ALWAYS_FALSE, ALWAYS_FALSE, false, false, false, ALWAYS_FALSE, ALWAYS_FALSE, true) # unknown mostly, but it's not overlayed at least (e.g. it's not a call)
162:    nonoverlayed::Bool = effects.nonoverlayed)
171:        nonoverlayed)
183:        merge_effectbits(old.nonoverlayed, new.nonoverlayed))
200:is_nonoverlayed(effects::Effects)        = effects.nonoverlayed
241:           ((e.nonoverlayed        % UInt32) << 12)

base/compiler/ssair/show.jl
1021:    e.nonoverlayed || printstyled(io, '′'; color=:red)

base/compiler/inferencestate.jl
302:            ipo_effects = Effects(ipo_effects; nonoverlayed=is_nonoverlayed(def))
323:is_nonoverlayed(m::Method) = !isdefined(m, :external_mt)
324:is_nonoverlayed(interp::AbstractInterpreter) = !isoverlayed(method_table(interp))
325:isoverlayed(::MethodTableView) = error("unsatisfied MethodTableView interface")
326:isoverlayed(::InternalMethodTable) = false
327:isoverlayed(::OverlayMethodTable) = true
328:isoverlayed(mt::CachedMethodTable) = isoverlayed(mt.table)

src/gf.c
802:            // there are insufficient given parameters for jl_isa_compileable_sig now to like this type
1004:JL_DLLEXPORT int jl_isa_compileable_sig(
1279:        // In most cases `!jl_isa_compileable_sig(tt, sparams, definition)`,
1291:    // TODO: maybe assert(jl_isa_compileable_sig(compilationsig, sparams, definition));
2591:                                                        int return_if_compileable)
2599:    int is_compileable = ((jl_datatype_t*)ti)->isdispatchtuple;
2602:        if (!is_compileable) {
2612:    if (!is_compileable)
2613:        is_compileable = jl_isa_compileable_sig(tt, env, m);
2615:    return (!return_if_compileable || is_compileable) ? (jl_value_t*)tt : jl_nothing;
2638:// return a MethodInstance for a compileable method_match
2702:// tries to find a method for which the requested signature is compileable.
2727:        // first, select methods for which `types` is compileable
2731:            if (jl_isa_compileable_sig(types, match1->sparams, match1->method))

src/julia.h
435:    //     uint8_t ipo_nonoverlayed        : 1;
444:    //     uint8_t nonoverlayed        : 1;
1564:JL_DLLEXPORT int jl_isa_compileable_sig(jl_tupletype_t *type, jl_svec_t *sparams, jl_method_t *definition);

src/precompile_utils.c
160:        if (jl_is_datatype(m->sig) && jl_isa_compileable_sig((jl_tupletype_t*)m->sig, jl_emptysvec, m)) {
252:            if (mi != jl_atomic_load_relaxed(&mi->def.method->unspecialized) && !jl_isa_compileable_sig((jl_tupletype_t*)mi->specTypes, mi->sparam_vals, mi->def.method))

src/jl_exported_funcs.inc
288:    XX(jl_isa_compileable_sig) \

It's hard to know how much false positives will be an issue without trying and finding out, but I suspect that the false positivity rate will be very low in practice. It can be bounded above by the reported false positivity rates in #51704 (comment), assuming PR text is more typo-ey than text already merged into master.

@LilithHafner
Copy link
Member Author

See #51739 for an example of this action applied to the most recent PR with significant code additions.

Feel free to recommend any other open PR for me to test this out on. I'm optimistic it will perform well.

@LilithHafner
Copy link
Member Author

I'm going to remove the test typos and get this ready for merge. We can always revert if there are too many false positives.

@LilithHafner LilithHafner removed the DO NOT MERGE Do not merge this PR! label Oct 21, 2023
@LilithHafner LilithHafner marked this pull request as ready for review October 21, 2023 13:40
@DilumAluthge DilumAluthge requested review from DilumAluthge and removed request for DilumAluthge October 22, 2023 04:45
@LilithHafner LilithHafner merged commit 2d8325b into master Oct 24, 2023
2 checks passed
@LilithHafner LilithHafner deleted the lh/spell branch October 24, 2023 13:25
@KristofferC
Copy link
Member

KristofferC commented Oct 24, 2023

However, I do still think it is acceptable to spell check running code. Specifically, "overlayed" "compileable" and "supress" have all made it into running code. supress_output actually made it into public API.

That are indeed good examples why running it on code is in fact useful.

@LilithHafner
Copy link
Member Author

LilithHafner commented Nov 3, 2023

Here's an example of catching a code typo pre-merge

1d90487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants