Class RedAmber::Vector
represents a series of data in the DataFrame.
df = DataFrame.new(x: [1, 2, 3])
df[:x]
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000000f4ec>
[1, 2, 3]
vector = Vector.new([1, 2, 3])
# or
vector = Vector.new(1, 2, 3)
# or
vector = Vector.new(1..3)
# or
vector = Vector.new(Arrow::Array.new([1, 2, 3])
# or
require 'arrow-numo-narray'
vector = Vector.new(Numo::Int8[1, 2, 3])
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000000f514>
[1, 2, 3]
Return indices in an Array.
It implicitly converts a Vector to an Array when required.
[1, 2] + Vector.new([3, 4])
# =>
[1, 2, 3, 4]
If block is not given, returns Enumerator.
n_nulls
is an alias ofn_nils
Returns true
if self has any nil
. Otherwise returns false
.
-
limit
sets size limit to display a long array.vector = Vector.new((1..50).to_a) # => #<RedAmber::Vector(:uint8, size=50):0x000000000000f528> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ... ]
- Acceptable class for indices:
- Integer, Float
- Vector of integer or float
- Arrow::Arry of integer or float
- Negative index is also OK like the Ruby's primitive Array.
array = Vector.new(%w[A B C D E])
indices = Vector.new([0.1, -0.5, -5.1])
array.take(indices)
# or
array[indices]
# =>
#<RedAmber::Vector(:string, size=3):0x000000000000f820>
["A", "E", "A"]
- Acceptable class for booleans:
- An array of true, false, or nil
- Boolean Vector
- Arrow::BooleanArray
array = Vector.new(%w[A B C D E])
booleans = [true, false, nil, false, true]
array.filter(booleans)
# or
array[booleans]
# =>
#<RedAmber::Vector(:string, size=2):0x000000000000f21c>
["A", "E"]
filter
and select
also accepts a block.
Method | Boolean | Numeric | String | Options | Remarks |
---|---|---|---|---|---|
✓ all? |
✓ | ✓ ScalarAggregate | alias all |
||
✓ any? |
✓ | ✓ ScalarAggregate | alias any |
||
✓ approximate_median |
✓ | ✓ ScalarAggregate | alias median |
||
✓ count |
✓ | ✓ | ✓ | ✓ Count | |
✓ count_distinct |
✓ | ✓ | ✓ | ✓ Count | alias count_uniq |
[ ]index |
[ ] | [ ] | [ ] | [ ] Index | |
✓ max |
✓ | ✓ | ✓ | ✓ ScalarAggregate | |
✓ mean |
✓ | ✓ | ✓ ScalarAggregate | ||
✓ min |
✓ | ✓ | ✓ | ✓ ScalarAggregate | |
✓ min_max |
✓ | ✓ | ✓ | ✓ ScalarAggregate | |
[ ]mode |
[ ] | [ ] Mode | |||
✓ product |
✓ | ✓ | ✓ ScalarAggregate | ||
✓ quantile |
✓ | ✓ Quantile | Specify probability in (0..1) by a parameter (default=0.5) | ||
✓ sd |
✓ | ddof: 1 at stddev |
|||
✓ stddev |
✓ | ✓ Variance | ddof: 0 by default | ||
✓ sum |
✓ | ✓ | ✓ ScalarAggregate | ||
[ ]tdigest |
[ ] | [ ] TDigest | |||
✓ var |
✓ | ddof: 1 at variance alias unbiased_variance |
|||
✓ variance |
✓ | ✓ Variance | ddof: 0 by default |
Options can be used as follows. See the document of C++ function for detail.
double = Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
#=>
#<RedAmber::Vector(:double, size=6):0x000000000000f910>
[1.0, NaN, -Infinity, Infinity, nil, 0.0]
double.count #=> 5
double.count(mode: :only_valid) #=> 5, default
double.count(mode: :only_null) #=> 1
double.count(mode: :all) #=> 6
boolean = Vector.new([true, true, nil])
#=>
#<RedAmber::Vector(:boolean, size=3):0x000000000000f924>
[true, true, nil]
boolean.all #=> true
boolean.all(skip_nulls: true) #=> true
boolean.all(skip_nulls: false) #=> false
Return true if function
is an unary aggregation function. Otherwise return false.
Spread the return value of an aggregate function as if it is a element-wise function.
vec = Vector.new(1, 2, 3, 4)
vec.propagate(:mean)
# =>
#<RedAmber::Vector(:double, size=4):0x000000000001985c>
[2.5, 2.5, 2.5, 2.5]
#propagate
also accepts a block to compute with a customized aggregation function yielding a scalar.
vec.propagate { |v| v.mean.round }
# =>
#<RedAmber::Vector(:uint8, size=4):0x000000000000cb98>
[3, 3, 3, 3]
Method | Boolean | Numeric | String | Options | Remarks |
---|---|---|---|---|---|
✓ -@ |
✓ | as -vector |
|||
✓ negate |
✓ | -@ |
|||
✓ abs |
✓ | ||||
✓ acos |
✓ | ||||
✓ asin |
✓ | ||||
✓ atan |
✓ | ||||
✓ bit_wise_not |
(✓) | integer only | |||
✓ ceil |
✓ | ||||
✓ cos |
✓ | ||||
✓fill_nil_backward |
✓ | ✓ | ✓ | ||
✓fill_nil_forward |
✓ | ✓ | ✓ | ||
✓ floor |
✓ | ||||
✓ invert |
✓ | ! , alias not |
|||
✓ ln |
✓ | ||||
✓ log10 |
✓ | ||||
✓ log1p |
✓ | Compute natural log of (1+x) | |||
✓ log2 |
✓ | ||||
✓ round |
✓ | ✓ Round (:mode, :n_digits) | |||
✓ round_to_multiple |
✓ | ✓ RoundToMultiple :mode, :multiple | multiple must be an Arrow::Scalar | ||
✓ sign |
✓ | ||||
✓ sin |
✓ | ||||
✓sort_indexes |
✓ | ✓ | ✓ | :order | alias sort_indices |
✓ tan |
✓ | ||||
✓ trunc |
✓ |
Examples of options for #round
;
:n-digits
The number of digits to show.round_mode
Specify rounding mode.
double = Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])
# => [15.15, 2.5, 3.5, -4.5, -5.5]
double.round
# => [15.0, 2.0, 4.0, -4.0, -6.0]
double.round(mode: :half_to_even)
# => Default. Same as double.round
double.round(mode: :towards_infinity)
# => [16.0, 3.0, 4.0, -5.0, -6.0]
double.round(mode: :half_up)
# => [15.0, 3.0, 4.0, -4.0, -5.0]
double.round(mode: :half_towards_zero)
# => [15.0, 2.0, 3.0, -4.0, -5.0]
double.round(mode: :half_towards_infinity)
# => [15.0, 3.0, 4.0, -5.0, -6.0]
double.round(mode: :half_to_odd)
# => [15.0, 3.0, 3.0, -5.0, -5.0]
double.round(n_digits: 0)
# => Default. Same as double.round
double.round(n_digits: 1)
# => [15.2, 2.5, 3.5, -4.5, -5.5]
double.round(n_digits: -1)
# => [20.0, 0.0, 0.0, -0.0, -10.0]
Method | Boolean | Numeric | String | Options | Remarks |
---|---|---|---|---|---|
✓ add |
✓ | + |
|||
✓ atan2 |
✓ | ||||
✓ and_kleene |
✓ | & |
|||
✓ and_org |
✓ | and in Red Arrow |
|||
✓ and_not |
✓ | ||||
✓ and_not_kleene |
✓ | ||||
✓ bit_wise_and |
(✓) | integer only | |||
✓ bit_wise_or |
(✓) | integer only | |||
✓ bit_wise_xor |
(✓) | integer only | |||
✓ divide |
✓ | / |
|||
✓ equal |
✓ | ✓ | ✓ | == , alias eq |
|
✓ greater |
✓ | ✓ | ✓ | > , alias gt |
|
✓ greater_equal |
✓ | ✓ | ✓ | >= , alias ge |
|
✓ is_finite |
✓ | ||||
✓ is_inf |
✓ | ||||
✓ is_na |
✓ | ✓ | ✓ | ||
✓ is_nan |
✓ | ||||
[ ]is_nil |
✓ | ✓ | ✓ | [ ] Null | alias is_null |
✓ is_valid |
✓ | ✓ | ✓ | ||
✓ less |
✓ | ✓ | ✓ | < , alias lt |
|
✓ less_equal |
✓ | ✓ | ✓ | <= , alias le |
|
✓ logb |
✓ | logb(b) Compute base b logarithm |
|||
[ ]mod |
[ ] | % |
|||
✓ multiply |
✓ | * |
|||
✓ not_equal |
✓ | ✓ | ✓ | != , alias ne |
|
✓ or_kleene |
✓ | | |
|||
✓ or_org |
✓ | or in Red Arrow |
|||
✓ power |
✓ | ** |
|||
✓ subtract |
✓ | - |
|||
✓ shift_left |
(✓) | << , integer only |
|||
✓ shift_right |
(✓) | >> , integer only |
|||
✓ xor |
✓ | ^ |
Returns a new array with distinct elements.
Compute counts of unique elements and return a Hash.
It returns almost same result as Ruby's tally. These methods consider NaNs are same.
array = [0.0/0, Float::NAN]
array.tally #=> {NaN=>1, NaN=>1}
vector = Vector.new(array)
vector.tally #=> {NaN=>2}
vector.value_counts #=> {NaN=>2}
Returns index of specified element.
quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)
Returns quantiles for specified probabilities in a DataFrame.
vector = Vector.new(1,2,3)
# =>
#<RedAmber::Vector(:uint8, size=3):0x00000000000decc4>
[1, 2, 3]
# Vector's `#*` method
vector * -1
# =>
#<RedAmber::Vector(:int16, size=3):0x00000000000e3698>
[-1, -2, -3]
# coerced calculation
-1 * vector
# =>
#<RedAmber::Vector(:int16, size=3):0x00000000000ea4ac>
[-1, -2, -3]
# `@-` operator
-vector
# =>
#<RedAmber::Vector(:uint8, size=3):0x00000000000ee7b4>
[255, 254, 253]
- Accepts Scalar, Range of Integer, Vector, Array, Arrow::Array as a specifier
- Accepts Scalar, Vector, Array and Arrow::Array as a replacer.
- Boolean specifiers specify the position of replacer in true.
- If booleans.any is false, no replacement happen and return self.
- Index specifiers specify the position of replacer in indices.
- replacer specifies the values to be replaced.
- The number of true in booleans must be equal to the length of replacer
vector = Vector.new([1, 2, 3])
booleans = [true, false, true]
replacer = [4, 5]
vector.replace(booleans, replacer)
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
[4, 2, 5]
- Scalar value in replacer can be broadcasted.
replacer = 0
vector.replace(booleans, replacer)
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
[0, 2, 0]
- Returned data type is automatically up-casted by replacer.
replacer = 1.0
vector.replace(booleans, replacer)
# =>
#<RedAmber::Vector(:double, size=3):0x0000000000025d78>
[1.0, 2.0, 1.0]
- Position of nil in booleans is replaced with nil.
booleans = [true, false, nil]
replacer = -1
vector.replace(booleans, replacer)
=>
#<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
[-1, 2, nil]
- replacer can have nil in it.
booleans = [true, false, true]
replacer = [nil]
vector.replace(booleans, replacer)
=>
#<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
[nil, 2, nil]
- An example to replace 'NA' to nil.
vector = Vector.new(['A', 'B', 'NA'])
vector.replace(vector == 'NA', nil)
# =>
#<RedAmber::Vector(:string, size=3):0x000000000000f8ac>
["A", "B", nil]
- Specifier in indices.
Specified indices are used 'as sorted'. Position in indices and replacer may not have correspondence.
vector = Vector.new([1, 2, 3])
indices = [2, 1]
replacer = [4, 5]
vector.replace(indices, replacer)
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000000f244>
[1, 4, 5] # not [1, 5, 4]
Propagate the last valid observation forward (or backward). Or preserve nil if all previous values are nil or at the end.
integer = Vector.new([0, 1, nil, 3, nil])
integer.fill_nil_forward
# =>
#<RedAmber::Vector(:uint8, size=5):0x000000000000f960>
[0, 1, 1, 3, 3]
integer.fill_nil_backward
# =>
#<RedAmber::Vector(:uint8, size=5):0x000000000000f974>
[0, 1, 3, 3, nil]
Choose values based on self. Self must be a boolean Vector.
true_choice
, false_choice
must be of the same type scalar / array / Vector.
nil
values in cond
will be promoted to the output.
This example will normalize negative indices to positive ones.
indices = Vector.new([1, -1, 3, -4])
array_size = 10
normalized_indices = (indices < 0).if_else(indices + array_size, indices)
# =>
#<RedAmber::Vector(:int16, size=4):0x000000000000f85c>
[1, 9, 3, 6]
For each element in self, return true if it is found in given values
, false otherwise.
By default, nulls are matched against the value set. (This will be changed in SetLookupOptions: not impremented.)
vector = Vector.new %W[A B C D]
values = ['A', 'C', 'X']
vector.is_in(values)
# =>
#<RedAmber::Vector(:boolean, size=4):0x000000000000f2a8>
[true, false, true, false]
values
are casted to the same Class of Vector.
vector = Vector.new([1, 2, 255])
vector.is_in(1, -1)
# =>
#<RedAmber::Vector(:boolean, size=3):0x000000000000f320>
[true, false, true]
Shift vector's values by specified amount
. Shifted space is filled by value fill
.
vector = Vector.new([1, 2, 3, 4, 5])
vector.shift
# =>
#<RedAmber::Vector(:uint8, size=5):0x00000000000072d8>
[nil, 1, 2, 3, 4]
vector.shift(-2)
# =>
#<RedAmber::Vector(:uint8, size=5):0x0000000000009970>
[3, 4, 5, nil, nil]
vector.shift(fill: Float::NAN)
# =>
#<RedAmber::Vector(:double, size=5):0x0000000000011d3c>
[NaN, 1.0, 2.0, 3.0, 4.0]
Split string type Vector with any ASCII whitespace as separator. Returns an Array of Vectors.
vector = Vector.new(['a b', 'c d', 'e f'])
vector.split_to_columns
#=>
[#<RedAmber::Vector(:string, size=3):0x00000000000363a8>
["a", "c", "e"]
,
#<RedAmber::Vector(:string, size=3):0x00000000000363bc>
["b", "d", "f"]
]
It will be used for column splitting in DataFrame.
df = DataFrame.new(year_month: %w[2022-01 2022-02 2022-03])
.assign(:year, :month) { year_month.split_to_columns('-') }
.drop(:year_month)
#=>
#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f974>
year month
<string> <string>
0 2022 01
1 2022 02
2 2022 03
Split string type Vector with any ASCII whitespace as separator. Returns an flattend into rows by Vector.
vector = Vector.new(['a b', 'c d', 'e f'])
vector.split_to_rows
#=>
#<RedAmber::Vector(:string, size=6):0x000000000002ccf4>
["a", "b", "c", "d", "e", "f"]
Merge String or other string Vector to self using aseparator. Self must be a string Vector. Returns merged string Vector.
# with vector
vector = Vector.new(%w[a c e])
other = Vector.new(%w[b d f])
vector.merge(other)
#=>
#<RedAmber::Vector(:string, size=3):0x0000000000038b80>
["a b", "c d", "e f"]
If other is a String it will be broadcasted.
# with vector
vector = Vector.new(%w[a c e])
#=>
#<RedAmber::Vector(:string, size=3):0x00000000000446b0>
["a x", "c x", "e x"]
You can specify separator string by :sep.
# with vector
vector = Vector.new(%w[a c e])
other = Vector.new(%w[b d f])
vector.merge(other, sep: '')
#=>
#<RedAmber::Vector(:string, size=3):0x0000000000038b80>
["ab", "cd", "ef"]
Concatenate other array-like to self and return a concatenated Vector.
other
is one ofVector
,Array
,Arrow::Array
orArrow::ChunkedArray
- Different type will be 'resolved'.
Concatenate to string
string_vector
# =>
#<RedAmber::Vector(:string, size=2):0x00000000000037b4>
["A", "B"]
string_vector.concatenate([1, 2])
# =>
#<RedAmber::Vector(:string, size=4):0x0000000000003818>
["A", "B", "1", "2"]
Concatenate to integer
integer_vector
# =>
#<RedAmber::Vector(:uint8, size=2):0x000000000000382c>
[1, 2]
nteger_vector.concatenate(["A", "B"])
# =>
#<RedAmber::Vector(:uint8, size=4):0x0000000000003840>
[1, 2, 65, 66]
Returns numerical rank of self.
- Nil values are considered greater than any value.
- NaN values are considered greater than any value but smaller than nil values.
- Tiebreakers are ranked in order of appearance.
RankOptions
in C++ function is not implemented in C GLib yet. This method is currently fixed to the default behavior.
Returns 0-based rank of self (0...size in range) as a Vector.
Rank of float Vector
fv = Vector.new(0.1, nil, Float::NAN, 0.2, 0.1); fv
# =>
#<RedAmber::Vector(:double, size=5):0x000000000000c65c>
[0.1, nil, NaN, 0.2, 0.1]
fv.rank
# =>
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
[0, 4, 3, 2, 1]
Rank of string Vector
sv = Vector.new("A", "B", nil, "A", "C"); sv
# =>
#<RedAmber::Vector(:string, size=5):0x0000000000003854>
["A", "B", nil, "A", "C"]
sv.rank
# =>
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
[0, 2, 4, 1, 3]
Pick up elements at random.
Return a randomly selected element. This is one of an aggregation function.
v = Vector.new('A'..'H'); v
# =>
#<RedAmber::Vector(:string, size=8):0x0000000000011b20>
["A", "B", "C", "D", "E", "F", "G", "H"]
v.sample
# =>
"C"
Pick up n elements at random.
- Param
n
is number of elements to pick. n
is a positive Integer- If
n
is smaller or equal to size, elements are picked by non-repeating. - If
n
is greater thansize
, elements are picked repeatedly. @return [Vector] sampled elements. - If
n == 1
(in case ofsample(1)
), it returns a Vector ofsize == 1
not a scalar.
v.sample(1)
# =>
#<RedAmber::Vector(:string, size=1):0x000000000001a3b0>
["H"]
Sample same size of self: every element is picked in random order.
v.sample(8)
# =>
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
["H", "D", "B", "F", "E", "A", "G", "C"]
Over sampling: "E" and "A" are sampled repeatedly.
v.sample(9)
# =>
#<RedAmber::Vector(:string, size=9):0x000000000001d790>
["E", "E", "A", "D", "H", "C", "A", "F", "H"]
Pick up elements by proportion prop
at random.
prop
is proportion of elements to pick.prop
is a positive Float.- Absolute number of elements to pick:
prop*size
is rounded (byhalf: :up
). - If
prop
is smaller or equal to 1.0, elements are picked by non-repeating. - If
prop
is greater than 1.0, some elements are picked repeatedly. - Returns sampled elements by a Vector.
- If picked element is only one, it returns a Vector of
size == 1
not a scalar.
Sample same size of self: every element is picked in random order.
v.sample(1.0)
# =>
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
["D", "H", "F", "C", "A", "B", "E", "G"]
2 times over sampling.
v.sample(2.0)
# =>
#<RedAmber::Vector(:string, size=16):0x00000000000233e8>
["H", "B", "C", "B", "C", "A", "F", "A", "E", "C", "H", "F", "F", "A", ... ]
Arrange values in Vector.
:+
,:ascending
or without argument will sort in increasing order.:-
or:descending
will sort in decreasing order.
Vector.new(%w[B D A E C]).sort
# same as #sort(:+)
# same as #sort(:ascending)
# =>
#<RedAmber::Vector(:string, size=5):0x000000000000c134>
["A", "B", "C", "D", "E"]
Vector.new(%w[B D A E C]).sort(:-)
# same as #sort(:descending)
# =>
#<RedAmber::Vector(:string, size=5):0x000000000000c148>
["E", "D", "C", "B", "A"]