You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using the following abstraction over rle in some private code:
""" Run{T}An object which represents a "run" of repeated values in a vector, produced by [`get_runs`](@ref).See also: [`run_length`](@ref), [`run_value`](@ref), and [`run_indices`](@ref)."""struct Run{T}
val::T
indices::UnitRange{Int}end""" run_length(r::Run)Returns the length of a given run."""run_length(r::Run) =length(r.indices)
""" run_value(r::Run)Returns the value of a given run."""run_value(r::Run) = r.val
""" run_indices(r::Run)Returns the indices (into the original vector) associated to the given run."""run_indices(r::Run) = r.indices
""" get_runs(v) -> Vector{Run}Performs a run-length encoding, returning a sorted vector of [`Run`](@ref) objects, each which support [`run_length`](@ref), [`run_value`](@ref), and [`run_indices`](@ref)."""functionget_runs(v)
vals, lens =rle(v)
cs =cumsum(lens)
run_starts = cs .- lens .+1
run_stops = cs
return [Run(vals[i], run_starts[i]:run_stops[i])
for i ineachindex(vals, run_starts, run_stops)]
end
Would this be appropriate to upstream to StatsBase.jl (w/ tests)?
I have found this useful for manipulating boolean masks in-place, some examples below. I found the code written in terms of this api much simpler/easier to follow, since the lengths <-> index conversion is hidden away in get_runs, where it can be written/tested once and then used a bunch of times in downstream code.
Example usages
""" ranges_where_true(v) -> Vector{UnitRange{Int}}Returns the sorted collection of indices where `v` has elements `true`, assuming `v` is a boolean mask."""functionranges_where_true(v)
runs =get_runs(v)
filter!(run_value, runs) # only runs with value `true`returnrun_indices.(runs)
end""" merge_mask_gaps!(v, N::Integer) -> vGiven a boolean mask `v`, merges "gaps" (i.e. runs of falses surrounded by trues) of length up to `N`, modifying `v` in-place.## Example```jldoctestjulia> show(merge_mask_gaps!([true, false, true, false], 1))Bool[1, 1, 1, 0]julia> show(merge_mask_gaps!([true, false, false, true, false], 1))Bool[1, 0, 0, 1, 0]julia> show(merge_mask_gaps!([true, false, false, true, false], 2))Bool[1, 1, 1, 1, 0]```"""functionmerge_mask_gaps!(v, N::Integer)
@argcheck N >=0
N ==0&&return v
runs =get_runs(v)
# We want to find short 0 runs to flip. We only want to flip ones after the first non-zero run, and before the last non-zero run.
f =findfirst(run_value, runs)
l =findlast(run_value, runs)
if!isnothing(f) &&!isnothing(l)
for i in f:l
run = runs[i]
if!run_value(run) &&run_length(run) <= N
v[run_indices(run)] .=trueendendendreturn v
end
The text was updated successfully, but these errors were encountered:
I have been using the following abstraction over
rle
in some private code:Would this be appropriate to upstream to StatsBase.jl (w/ tests)?
I have found this useful for manipulating boolean masks in-place, some examples below. I found the code written in terms of this api much simpler/easier to follow, since the lengths <-> index conversion is hidden away in
get_runs
, where it can be written/tested once and then used a bunch of times in downstream code.Example usages
The text was updated successfully, but these errors were encountered: