Run `crate-ci/typos` in CI #51704

LilithHafner · 2023-10-14T16:02:09Z

TODO:

In the actions/checkout action, set persist-credentials: false.
Set a short timeout (e.g. 2 minutes or 5 minutes).
Fix true spelling errors (Fix typos [nfc] #51709)
~~- [ ] Create an ignore list with the false positives?~~

List of false positives that could be stored in typos.toml, if folks decide to go that route down the line

# Hello and Welcome to typos.toml!

## Policy
# tldr: Don't let CI bully you. Use whatever varaible names you want.
#
# The spell check CI feature exists to help detect accidental typos, not to constrain the
# language we use in this repo. If you want to use a word or a variable name or somesuch
# that the CI check is not happy with, go right ahead, just make sure to add the appropriate
# entry to this file to tell the CI check that you didn't make a mistake. As a reviewier, if
# you think that a variable name is bad, go right ahead and tell the author that, just don't
# cite the CI check as a reason, that's not what the check is for.

## How to
# tldr: Add the whole identifier right below [default.extend-identifiers]
#
# There are two magor sections in this file, [default.extend-identifiers] (near the top) and
# [default.extend-words] (near the bottom). If the false positive is a full identifier like
# `mrs_claus` or `breal`, add it to [default.extend-identifiers] and that specific
# identifier will no longer be flagged as a typo. If, on the other hand, it is a word that
# appeasrs inside multiple a multi-word identifiers, like `ser` which is used in
# `jl_lookup_ser_tag`, `ser_tag`, `ser_version`, etc. then insrted of adding each of those
# identifiers individually, simply add it to the [default.extend-words] section. Identifiers
# are case sensitive and words are not.

[default.extend-identifiers]
mrs_claus = "mrs_claus"
womens = "womens"
breal = "breal"
Breal = "Breal"
Numer = "Numer"
splitted = "splitted"
affinitized = "affinitized"
key_smove = "key_smove"
readed_zero = "readed_zero"
check_inconsistentcy = "check_inconsistentcy"
L_OP_CALLL = "L_OP_CALLL"
OP_CALLL = "OP_CALLL"

# strings in test directories
Facilisi = "Facilisi"
vailable = "vailable"

# typos that are programmatically observable
supress_output = "supress_output"
hashs_seed = "hashs_seed"

[default.extend-words]
ba = "ba"
egal = "egal"
parm = "parm"
modul = "modul"
nd = "nd"
strat = "strat"
WRONLY = "WRONLY"
thisy = "thisy"
vally = "vally"
Ot = "Ot"
clos = "clos"
applys = "applys"
findn = "findn"
mis = "mis"
seh = "seh"
ue = "ue"
Missings = "Missings"
SOM = "SOM"
sais = "sais"
Merly = "Merly"
Filetimes = "Filetimes"
HSA = "HSA"
Strategems = "Strategems"
anumber = "anumber"
uupper = "uupper"
shttp = "shttp"
ser = "ser"
somes = "somes"
sav = "sav"
egals = "egals"
OLT = "OLT"
eyt = "eyt"
matc = "matc"
nam = "nam"
alls = "alls"
noe = "noe"

# strings in tests directories
fo = "fo"
Uest = "Uest"
Dows = "Dows"
aquire = "aquire"

# typos that are programmatically observable
compileable = "compileable"
overlayed = "overlayed"

[default]
extend-ignore-identifiers-re = ["^[a-zA-Z][a-zA-Z]?[a-zA-Z]?[a-zA-Z]?$"]

DilumAluthge · 2023-10-14T16:09:37Z

I've added crate-ci/typos@* to the GitHub Actions allowlist for this repo.

.github/workflows/SpellCheck.yml

LilithHafner · 2023-10-14T16:12:55Z

I was wondering what was going on :)

I noticed you asked for a short timeout in #48359, do you want that here too?

DilumAluthge · 2023-10-14T16:15:40Z

I noticed you asked for a short timeout in #48359, do you want that here too?

Yeah, that would be good. Looks like this action runs pretty quickly, so a few minutes should be sufficient for a timeout.

.github/workflows/SpellCheck.yml

DilumAluthge

Overall, this seems like a good plan.

I'll do another thorough line-by-line review later.

We likely want to wait to merge this until the actual typos (the true positives) have been fixed. Lilith has a PR open for that.

.github/workflows/SpellCheck.yml

typos.toml

epage · 2023-10-17T00:08:28Z

If any of the false positives from abbreviations are from the standard library, I'd be interested in getting those baked into a julia-specific dictionary. Similar for highly common package names (e.g. flate2 in rust) or community api names (e.g. ser in rust). That can lead to a better experience for all Julia users.

LilithHafner · 2023-10-17T01:17:18Z

Here's a list of false positives that seem worth including in a Julia-specific dictionary:

Alls # All is a keyword
alls 
Egal # egal is a term for `===`
egal 
Egals
egals
Missings # Missing is a type (noun that can be pluralized)
missings # missing is a value (noun that can be pluralized)
ND # nd stands for number of dimensions, which is somewhat common
Nd
nd
somes # Some is a type (noun that can be pluralized)
splitted # `split` is a Base function so splitted is its past tense
modul # module and using are keywords, these abbreviations are used in Base and may be used elsewhere. The key detail is nobody will accidentally write modul or usig when they mean module or using because that would be caught much earlier than a typo check.
usig
whos # A function from before 1.0 that has since been removed but folks may still refer to it?

shttp # is an actuall protocall, and markdown blocks can be fenced with it as a label. This is not Julia specific.

Almost all of the remaining are abbreviations that happen to look like typos or deliberate typos used in testing.

epage · 2023-10-17T01:37:49Z

Alls # All is a keyword
alls
Missings # Missing is a type (noun that can be pluralized)
missings # missing is a value (noun that can be pluralized)
somes # Some is a type (noun that can be pluralized)
splitted # split is a Base function so splitted is its past tense

Rust also has Some but I never see somes used. Is that commonly in code or is it more a thing in documentation, like

// Here are several Somes

In Rust, I'd expect to see that written as

// Here are several `Some`s

which negates the problem.

ND # nd stands for number of dimensions, which is somewhat common
Nd
nd

For Python, we specifically approved NDArray. I was looking in docs, trying to find how this is used in Julia but couldn't. Mind giving me some pointers?

modul # module and using are keywords, these abbreviations are used in Base and may be used elsewhere. The key detail is nobody will accidentally write modul or usig when they mean module or using because that would be caught much earlier than a typo check.
usig

I'm assuming this is the same problem as Rust when we need to refer to a crate (keyword), we fallback to krate and these are the fully idomatic ways of doing so.

LilithHafner · 2023-10-17T12:43:38Z

somes comes up in exactly one code snippet:

# Test completion for a case when type inference returned `Union` of the same types
union_somes(a, b) = rand() < 0.5 ? Some(a) : Some(b)
let s = "union_somes(1, 1.0)."
    c, r, res = test_complete_context(s)
    @test res
    @test "value" in c
end

nd comes up much more often, and I think this is a representative example:

function _reverse(A::BitArray, d::Int)
    nd = ndims(A)
    1 ≤ d ≤ nd || throw(ArgumentError("dimension $d is not 1 ≤ $d ≤ $nd"))

I wouldn't call modul and usig as quite as idiomatic as you are describing, but similar concept.

LilithHafner · 2023-10-17T12:59:20Z

The design now (which I think may be the final design) is...

Look at each file that was edited and neither created nor deleted. Scan these all for typos longer than 4 characters and compute the union. This is our baseline of "false positives". Compile this list into a Python set.

Look at each file that was edited and not deleted. Scan these all for typos longer than 4 characters, skipping any typos that are in the set of false positives. For each remaining typo, provide a github annotation and, if the typo is longer than 6 characters (at least 94% true positivity rate for typos that long), mark the CI job as a failure but continue to report all remaining novel typos.

This achieves:

Close to the full sensitivity of the typos crate
Almost never erroneously fails CI
Few false positives reported (hopefully)
When there is a false positive, it is safe to simply ignore it and press merge, even if the typos CI check was failing.

Runtime: 6s total, 1s doing the actual checks, typos goes through only the files that have been changed (and goes through them twice) bash handles lists of length equal to the number of files edited, python runs in O(number of reported typos). 5 minute timeout.

KristofferC · 2023-10-17T13:21:16Z

Looking at the suggestions in #51707, spell-checking actual running code seems like a bad idea. Running it on comments and other types of text seems useful.

LilithHafner · 2023-10-17T14:06:54Z

If it were trivial to exclude running code, I would have done so. However, I do still think it is acceptable to spell check running code. Specifically, "overlayed" "compileable" and "supress" have all made it into running code. supress_output actually made it into public API. Further, false positive avoidance seems pretty good to me. None of the suggestions you were laughing at in #51707 would be flagged in this PR.

Usage of "overlayed" "compileable" and "supress"

test/namedtuple.jl
390:            @test eff.nonoverlayed
397:            @test eff.nonoverlayed

test/reflection.jl
1030:        @test !Core.Compiler.is_nonoverlayed(effects)

test/compiler/codegen.jl
799:@test Core.Compiler.get_compileable_sig(which(f48085, (Vararg{Any},)), Tuple{typeof(f48085), Vararg{Int}}, Core.svec()) === nothing
800:@test Core.Compiler.get_compileable_sig(which(f48085, (Vararg{Any},)), Tuple{typeof(f48085), Int, Vararg{Int}}, Core.svec()) === Tuple{typeof(f48085), Any, Vararg{Any}}

test/compiler/AbstractInterpreter.jl
23:@MethodTable OverlayedMT
24:CC.method_table(interp::MTOverlayInterp) = CC.OverlayMethodTable(CC.get_world_counter(interp), OverlayedMT)
35:@overlay OverlayedMT strangesin(x::Float64) = iszero(x) ? nothing : cos(x)
37:# inference should use the overlayed method table
45:# effect analysis should figure out that the overlayed method is used
48:end |> !Core.Compiler.is_nonoverlayed
51:end |> !Core.Compiler.is_nonoverlayed
62:        @test Base.infer_effects(callstrange_entry, (Any,); interp) |> !Core.Compiler.is_nonoverlayed
65:        @warn "`nonoverlayed` test for inference bailing out is skipped since the method match sort order is changed."
72:end |> Core.Compiler.is_nonoverlayed
75:end |> Core.Compiler.is_nonoverlayed
87:@overlay OverlayedMT overlay_match(::Int) = missing
109:# disable partial concrete evaluation when tainted by any overlayed call
141:# Should not concrete-eval overlayed methods in semi-concrete interpretation

stdlib/REPL/src/TerminalMenus/config.jl
130: - `supress_output::Bool=false`: Ignored legacy argument, pass `suppress_output` as a keyword argument to `request` instead.
144:                supress_output::Union{Nothing, Bool}=nothing,   # typo was documented, unfortunately
176:    supress_output isa Bool   && (CONFIG[:supress_output] = supress_output)
183:config(charset=:ascii, scroll=:nowrap, supress_output=false, ctrl_c_interrupt=true)

base/loading.jl
2306:        exit(125) # we define status = 125 means PrecompileableError

base/compiler/utilities.jl
157:function get_compileable_sig(method::Method, @nospecialize(atype), sparams::SimpleVector)
162:        mt, atype, sparams, method, #=int return_if_compileable=#1)
170:        mt, atype, sparams, method, #=int return_if_compileable=#0)
173:isa_compileable_sig(@nospecialize(atype), sparams::SimpleVector, method::Method) =
174:    !iszero(ccall(:jl_isa_compileable_sig, Int32, (Any, Any, Any), atype, sparams, method))

base/compiler/typeinfer.jl
349:        cache_the_tree = ci.inferred && (is_inlineable(ci) || isa_compileable_sig(linfo.specTypes, linfo.sparam_vals, def))

base/compiler/abstractinterpretation.jl
165:            csig = get_compileable_sig(method, sig, match.sparams)
454:        # ignore the `:nonoverlayed` property if `interp` doesn't use overlayed method table
456:        if !isoverlayed(method_table(interp))
457:            all_effects = Effects(all_effects; nonoverlayed=false)
848:            if is_nonoverlayed(interp) || is_nonoverlayed(effects)
851:            # disable concrete-evaluation if this function call is tainted by some overlayed
852:            # method since currently there is no easy way to execute overlayed methods
853:            add_remark!(interp, sv, "[constprop] Concrete eval disabled for overlayed methods")

base/compiler/ssair/irinterp.jl
18:        (is_nonoverlayed(interp) || is_nonoverlayed(effects)))

base/compiler/ssair/inlining.jl
812:function compileable_specialization(mi::MethodInstance, effects::Effects,
817:        new_atype = get_compileable_sig(method, atype, sparams)
839:function compileable_specialization(match::MethodMatch, effects::Effects,
842:    return compileable_specialization(mi, effects, et, info; compilesig_invokes)
894:        return compileable_specialization(mi, effects, et, info;
899:    src === nothing && return compileable_specialization(mi, effects, et, info;
1501:        return compileable_specialization(mi, result.effects, et, info;
1535:        return compileable_specialization(result.mi, result.effects, et, info;
1594:    case = compileable_specialization(match, Effects(), InliningEdgeTracker(state), info;

base/compiler/methodtable.jl
66:`overlayed` indicates if any of the matching methods comes from an overlayed method table.
115:        (match::Union{MethodMatch,Nothing}, valid_worlds::WorldRange, overlayed::Bool)
129:`overlayed` indicates if any of the matching methods comes from an overlayed method table.

base/compiler/effects.jl
45:- `nonoverlayed::Bool`: indicates that any methods that may be called within this method
46:  are not defined in an [overlayed method table](@ref OverlayMethodTable).
94:Additionally, if the `nonoverlayed` property is false, a red prime symbol (′) is displayed after the tuple.
104:    nonoverlayed::Bool
113:        nonoverlayed::Bool)
122:            nonoverlayed)
151:const EFFECTS_UNKNOWN  = Effects(ALWAYS_FALSE, ALWAYS_FALSE, false, false, false, ALWAYS_FALSE, ALWAYS_FALSE, true) # unknown mostly, but it's not overlayed at least (e.g. it's not a call)
162:    nonoverlayed::Bool = effects.nonoverlayed)
171:        nonoverlayed)
183:        merge_effectbits(old.nonoverlayed, new.nonoverlayed))
200:is_nonoverlayed(effects::Effects)        = effects.nonoverlayed
241:           ((e.nonoverlayed        % UInt32) << 12)

base/compiler/ssair/show.jl
1021:    e.nonoverlayed || printstyled(io, '′'; color=:red)

base/compiler/inferencestate.jl
302:            ipo_effects = Effects(ipo_effects; nonoverlayed=is_nonoverlayed(def))
323:is_nonoverlayed(m::Method) = !isdefined(m, :external_mt)
324:is_nonoverlayed(interp::AbstractInterpreter) = !isoverlayed(method_table(interp))
325:isoverlayed(::MethodTableView) = error("unsatisfied MethodTableView interface")
326:isoverlayed(::InternalMethodTable) = false
327:isoverlayed(::OverlayMethodTable) = true
328:isoverlayed(mt::CachedMethodTable) = isoverlayed(mt.table)

src/gf.c
802:            // there are insufficient given parameters for jl_isa_compileable_sig now to like this type
1004:JL_DLLEXPORT int jl_isa_compileable_sig(
1279:        // In most cases `!jl_isa_compileable_sig(tt, sparams, definition)`,
1291:    // TODO: maybe assert(jl_isa_compileable_sig(compilationsig, sparams, definition));
2591:                                                        int return_if_compileable)
2599:    int is_compileable = ((jl_datatype_t*)ti)->isdispatchtuple;
2602:        if (!is_compileable) {
2612:    if (!is_compileable)
2613:        is_compileable = jl_isa_compileable_sig(tt, env, m);
2615:    return (!return_if_compileable || is_compileable) ? (jl_value_t*)tt : jl_nothing;
2638:// return a MethodInstance for a compileable method_match
2702:// tries to find a method for which the requested signature is compileable.
2727:        // first, select methods for which `types` is compileable
2731:            if (jl_isa_compileable_sig(types, match1->sparams, match1->method))

src/julia.h
435:    //     uint8_t ipo_nonoverlayed        : 1;
444:    //     uint8_t nonoverlayed        : 1;
1564:JL_DLLEXPORT int jl_isa_compileable_sig(jl_tupletype_t *type, jl_svec_t *sparams, jl_method_t *definition);

src/precompile_utils.c
160:        if (jl_is_datatype(m->sig) && jl_isa_compileable_sig((jl_tupletype_t*)m->sig, jl_emptysvec, m)) {
252:            if (mi != jl_atomic_load_relaxed(&mi->def.method->unspecialized) && !jl_isa_compileable_sig((jl_tupletype_t*)mi->specTypes, mi->sparam_vals, mi->def.method))

src/jl_exported_funcs.inc
288:    XX(jl_isa_compileable_sig) \

It's hard to know how much false positives will be an issue without trying and finding out, but I suspect that the false positivity rate will be very low in practice. It can be bounded above by the reported false positivity rates in #51704 (comment), assuming PR text is more typo-ey than text already merged into master.

LilithHafner · 2023-10-17T14:19:17Z

See #51739 for an example of this action applied to the most recent PR with significant code additions.

Feel free to recommend any other open PR for me to test this out on. I'm optimistic it will perform well.

LilithHafner · 2023-10-21T13:38:10Z

I'm going to remove the test typos and get this ready for merge. We can always revert if there are too many false positives.

KristofferC · 2023-10-24T15:04:35Z

However, I do still think it is acceptable to spell check running code. Specifically, "overlayed" "compileable" and "supress" have all made it into running code. supress_output actually made it into public API.

That are indeed good examples why running it on code is in fact useful.

LilithHafner · 2023-11-03T17:29:13Z

Here's an example of catching a code typo pre-merge

1d90487

Try using typos in CI

dc102e2

DilumAluthge closed this Oct 14, 2023

DilumAluthge reopened this Oct 14, 2023