404
-Page not found
-diff --git a/html/404.html b/html/404.html deleted file mode 100644 index 890f267..0000000 --- a/html/404.html +++ /dev/null @@ -1,162 +0,0 @@ - - -
- - - - - - - - -Page not found
-Super-duper dev mode. Basically, don't use this.
-Feature complete, full test coverage, all tests pass.
-All TODO items for 0.1.0 done including basic vignette / README. More -optimizations and tests.
-[ ] How do I set up the type hierarchy? - a. How do I share common code as high in the tree as possible? (wait for new features of abstract types in 0.4?) - b. Can I make it a subtype of Vector and get lots of the Vector - API for free? Can I then use it in other places that take a - vector? Like a DataFrame column?
-[x] How do I represent the runs? length, end, start/end?
-end allows for direct binarysearch for indexing and makes size a simple lookup -Gives 5X speedup for size, 40X for indexing on RLEVector(int([1:1:1e3]),int([1:1:1e3])) -19956X speedup over R (more efficient algo here though) for - foo = Rle( seq(1,1000,5), rep.int(5,200) ) - l = 1:1e3; system.time( for(i in l) { foo[100] } ) - vs. - foo = IntegerRle([ int(linspace(1,1000,200)) ], [ int(linspace(1,1000,200)) ]) - @time for i in 1:1e3 foo[100] end - 2000X speedup for foo + 4
-[ ] Is there a strictly increasing and positive int vector type I can leverage or make for the runs? - Maybe something that could be linked to the values? OrderedSet, IntSet? - For disjoin operations, it will be useful to know the unique runends in two+ sets of runs - Would be nice to have disjoin for RLEVector and RunEnds and IRanges and GRanges types
-[ ] What do I call the getters and setters? I want to use same getters for RLEs and GRanges and such. - begin, end and start are taken. first, step, and last make sense because of what they mean for ranges, but they would mean something else for a Vector - Maybe confusion between Ranges and Vector API means that I should just make my own and use rangestart, rangewidth, rangeend or rfirst, rwidth and rlast. With the latter, the 'r' could be range or run. - Maybe starts, widths, ends?
-[x] Is it a good idea to require two arg vectors to be the same length like this: function bob{T1,T1,N}(x::Vector{T1,N},y::Vector{T2,N}) ? Or just test the lengths and throw an ArgumentError?
-[x] Is 1 an appropriate start for an empty RLEVector? Does that imply that there is a value associated? Go to zero-based, half open (#can-of-worms)?. NO.
-show
with elipsis if length > 6, show runs and also expanded vector, use utils.repnrun
step
value"RLEVectors
is an alternate implementation of the Rle type from
-Bioconductor's IRanges package by H. Pages, P. Aboyoun and
-M. Lawrence. RLEVectors represent a vector with repeated values as the
-ordered set of values and repeat extents. In the field of genomics,
-data of various types are measured across the ~3 billion letters in
-the human genome can often be represented in a few thousand runs. It
-is useful to know the bounds of genome regions covered by these runs,
-the values associated with these runs, and to be able to perform
-various mathematical operations on these values.
Bioconductor has some widely used and extremely convenient types for
-working with collections of ranges, which sometimes are with
-associated data.IRanges
represents a collection of arbitrary start,
-end pairs in [1,Inf). GRanges
uses IRanges
to represent locations
-on a genome and adds annotation of the chromosome and strand for each
-range. Children of GRanges
add other annotations the the ranges. Rle
-represents the range [1:n] broken into arbitrary chunks or segments.
RLEVectors
differs from R's Rle
in that we store the run values
-and run ends rather than the run values and run lengths. The run ends
-are convenient in that they allow for indexing into the vector by
-binary search (scalar indexing is O(log(n)) rather than O(n) ).
-Additionally, length
is O(1) rather than O(n) (it's the last run
-end rather than the sum of the run lengths). On the other hand,
-various operations do require the run lengths, which have to be
-calculated. See the benchmark directory and reports to see how
-this plays out.
RLEVectors
can be created from a single vector or a vector of values and a vector of run ends. In either case runs of values or zero length runs will be compressed out. RLEVectors can be expanded to a full vector like a Range
with collect
.
x = RLEVector([1,1,2,2,3,3,4,4,4])
-x = RLEVector([4,5,6],[3,6,9])
-collect(x)
RLEVectors implement the standard Vector API and also other methods for describing the ranges and values:
-length(x)
# The full length of the vector, uncompressednrun(x)
# The number of runs in the vectorrstart(x)
# The index of the beginning of each runrwidth(x)
# The width of each runrstart(x)
# The index of the end of each runNaming for some of these functions is difficult given that many useful names are already reserved words (end
, start
, last
). Suggestions are welcome at this stage of development.
RLEVector
s can be treated as standard Vectors for arithmetic and collection operations. In many cases these operations are more efficient than operations on a standard vector.
x = RLEVector([4,5,6],[3,6,9])
x[2]
x[7:9] = 10
push!(x,6)
x + 2x
unique(x)
findin(x,5)
x > 4.2
sort(x)
median(x)
RLEVectors
has been extensively profiled and somewhat optimized. Please see the benchmarking section for the evolution over time and comparisons to like operations in R.
Data compression is a secondary benefit of RLEVector
s, but it can be convenient. Generally run ends are stored as Int64. However, if further memory savings are desired, consider smaller and unsigned types. UInt32 is sufficient to hold the length of the human genome and UInt16 can hold the length of the longest human chromosome.
RLEVector([5.1,2.9,100.7], UInt16[4,8,22])
The RLEVectors.jl package is licensed under the MIT License:
---This software is copyright (c) by Genentech.
-Permission is hereby granted, free of charge, to any person obtaining -a copy of this software and associated documentation files (the -"Software"), to deal in the Software without restriction, including -without limitation the rights to use, copy, modify, merge, publish, -distribute, sublicense, and/or sell copies of the Software, and to -permit persons to whom the Software is furnished to do so, subject -to the following conditions:
-The above copyright notice and this permission notice shall be -included in all copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS -BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN -ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN -CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-