Vector

Class RedAmber::Vector represents a series of data in the DataFrame.

Constructor

Create from a column in a DataFrame

df = DataFrame.new(x: [1, 2, 3])
df[:x]
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000000f4ec>
[1, 2, 3]

New from an Array

vector = Vector.new([1, 2, 3])
# or
vector = Vector.new(1, 2, 3)
# or
vector = Vector.new(1..3)
# or
vector = Vector.new(Arrow::Array.new([1, 2, 3])
# or
require 'arrow-numo-narray'
vector = Vector.new(Numo::Int8[1, 2, 3])

# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000000f514>
[1, 2, 3]

Properties

`to_s`

`values`, `to_a`, `entries`

`indices`, `indexes`, `indeces`

Return indices in an Array.

`to_ary`

It implicitly converts a Vector to an Array when required.

[1, 2] + Vector.new([3, 4])

# =>
[1, 2, 3, 4]

`size`, `length`, `n_rows`, `nrow`

`empty?`

`type`

`boolean?`, `numeric?`, `string?`, `temporal?`

`type_class`

`each`, `map`, `collect`

If block is not given, returns Enumerator.

`n_nils`, `n_nans`

n_nulls is an alias of n_nils

`has_nil?`

Returns true if self has any nil. Otherwise returns false.

`inspect(limit: 80)`

limit sets size limit to display a long array.

vector = Vector.new((1..50).to_a)
# =>
#<RedAmber::Vector(:uint8, size=50):0x000000000000f528>
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ... ]

Selecting Values

`take(indices)`, `[](indices)`

Acceptable class for indices:
- Integer, Float
- Vector of integer or float
- Arrow::Arry of integer or float
Negative index is also OK like the Ruby's primitive Array.

array = Vector.new(%w[A B C D E])
indices = Vector.new([0.1, -0.5, -5.1])
array.take(indices)
# or
array[indices]

# =>
#<RedAmber::Vector(:string, size=3):0x000000000000f820>
["A", "E", "A"]

`filter(booleans)`, `select(booleans)`, `[](booleans)`

Acceptable class for booleans:
- An array of true, false, or nil
- Boolean Vector
- Arrow::BooleanArray

array = Vector.new(%w[A B C D E])
booleans = [true, false, nil, false, true]
array.filter(booleans)
# or
array[booleans]

# =>
#<RedAmber::Vector(:string, size=2):0x000000000000f21c>
["A", "E"]

filter and select also accepts a block.

Functions

Unary aggregations: `vector.func => scalar`

Method	Boolean	Numeric	String	Options	Remarks
✓ `all?`	✓			✓ ScalarAggregate	alias `all`
✓ `any?`	✓			✓ ScalarAggregate	alias `any`
✓ `approximate_median`		✓		✓ ScalarAggregate	alias `median`
✓ `count`	✓	✓	✓	✓ Count
✓ `count_distinct`	✓	✓	✓	✓ Count	alias `count_uniq`
[ ]`index`	[ ]	[ ]	[ ]	[ ] Index
✓ `max`	✓	✓	✓	✓ ScalarAggregate
✓ `mean`	✓	✓		✓ ScalarAggregate
✓ `min`	✓	✓	✓	✓ ScalarAggregate
✓ `min_max`	✓	✓	✓	✓ ScalarAggregate
[ ]`mode`		[ ]		[ ] Mode
✓ `product`	✓	✓		✓ ScalarAggregate
✓ `quantile`		✓		✓ Quantile	Specify probability in (0..1) by a parameter (default=0.5)
✓ `sd`		✓			ddof: 1 at `stddev`
✓ `stddev`		✓		✓ Variance	ddof: 0 by default
✓ `sum`	✓	✓		✓ ScalarAggregate
[ ]`tdigest`		[ ]		[ ] TDigest
✓ `var`		✓			ddof: 1 at `variance` alias `unbiased_variance`
✓ `variance`		✓		✓ Variance	ddof: 0 by default

Options can be used as follows. See the document of C++ function for detail.

double = Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
#=>
#<RedAmber::Vector(:double, size=6):0x000000000000f910>
[1.0, NaN, -Infinity, Infinity, nil, 0.0]

double.count #=> 5
double.count(mode: :only_valid) #=> 5, default
double.count(mode: :only_null) #=> 1
double.count(mode: :all) #=> 6

boolean = Vector.new([true, true, nil])
#=>
#<RedAmber::Vector(:boolean, size=3):0x000000000000f924>
[true, true, nil]

boolean.all #=> true
boolean.all(skip_nulls: true) #=> true
boolean.all(skip_nulls: false) #=> false

Check if `function` is an aggregation function: `Vector.aggregate?(function)`

Return true if function is an unary aggregation function. Otherwise return false.

Treat aggregation function as an element-wise function: `propagate(function)`

Spread the return value of an aggregate function as if it is a element-wise function.

vec = Vector.new(1, 2, 3, 4)
vec.propagate(:mean)
# =>
#<RedAmber::Vector(:double, size=4):0x000000000001985c>
[2.5, 2.5, 2.5, 2.5]

#propagate also accepts a block to compute with a customized aggregation function yielding a scalar.

vec.propagate { |v| v.mean.round }
# =>
#<RedAmber::Vector(:uint8, size=4):0x000000000000cb98>                     
[3, 3, 3, 3]

Unary element-wise: `vector.func => vector`

Method	Boolean	Numeric	String	Options	Remarks
✓ `-@`		✓			as `-vector`
✓ `negate`		✓			`-@`
✓ `abs`		✓
✓ `acos`		✓
✓ `asin`		✓
✓ `atan`		✓
✓ `bit_wise_not`		(✓)			integer only
✓ `ceil`		✓
✓ `cos`		✓
✓`fill_nil_backward`	✓	✓	✓
✓`fill_nil_forward`	✓	✓	✓
✓ `floor`		✓
✓ `invert`	✓				`!`, alias `not`
✓ `ln`		✓
✓ `log10`		✓
✓ `log1p`		✓			Compute natural log of (1+x)
✓ `log2`		✓
✓ `round`		✓		✓ Round (:mode, :n_digits)
✓ `round_to_multiple`		✓		✓ RoundToMultiple :mode, :multiple	multiple must be an Arrow::Scalar
✓ `sign`		✓
✓ `sin`		✓
✓`sort_indexes`	✓	✓	✓	:order	alias `sort_indices`
✓ `tan`		✓
✓ `trunc`		✓

Examples of options for #round;

:n-digits The number of digits to show.
round_mode Specify rounding mode.

double = Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])
# => [15.15, 2.5, 3.5, -4.5, -5.5]
double.round
# => [15.0, 2.0, 4.0, -4.0, -6.0]
double.round(mode: :half_to_even)
# => Default. Same as double.round
double.round(mode: :towards_infinity)
# => [16.0, 3.0, 4.0, -5.0, -6.0]
double.round(mode: :half_up)
# => [15.0, 3.0, 4.0, -4.0, -5.0]
double.round(mode: :half_towards_zero)
# => [15.0, 2.0, 3.0, -4.0, -5.0]
double.round(mode: :half_towards_infinity)
# => [15.0, 3.0, 4.0, -5.0, -6.0]
double.round(mode: :half_to_odd)
# => [15.0, 3.0, 3.0, -5.0, -5.0]

double.round(n_digits: 0)
# => Default. Same as double.round
double.round(n_digits: 1)
# => [15.2, 2.5, 3.5, -4.5, -5.5]
double.round(n_digits: -1)
# => [20.0, 0.0, 0.0, -0.0, -10.0]

Binary element-wise: `vector.func(vector) => vector`

Method	Boolean	Numeric	String	Options	Remarks
✓ `add`		✓			`+`
✓ `atan2`		✓
✓ `and_kleene`	✓				`&`
✓ `and_org`	✓				`and` in Red Arrow
✓ `and_not`	✓
✓ `and_not_kleene`	✓
✓ `bit_wise_and`		(✓)			integer only
✓ `bit_wise_or`		(✓)			integer only
✓ `bit_wise_xor`		(✓)			integer only
✓ `divide`		✓			`/`
✓ `equal`	✓	✓	✓		`==`, alias `eq`
✓ `greater`	✓	✓	✓		`>`, alias `gt`
✓ `greater_equal`	✓	✓	✓		`>=`, alias `ge`
✓ `is_finite`		✓
✓ `is_inf`		✓
✓ `is_na`	✓	✓	✓
✓ `is_nan`		✓
[ ]`is_nil`	✓	✓	✓	[ ] Null	alias `is_null`
✓ `is_valid`	✓	✓	✓
✓ `less`	✓	✓	✓		`<`, alias `lt`
✓ `less_equal`	✓	✓	✓		`<=`, alias `le`
✓ `logb`		✓			logb(b) Compute base `b` logarithm
[ ]`mod`		[ ]			`%`
✓ `multiply`		✓			`*`
✓ `not_equal`	✓	✓	✓		`!=`, alias `ne`
✓ `or_kleene`	✓				`\|`
✓ `or_org`	✓				`or` in Red Arrow
✓ `power`		✓			`**`
✓ `subtract`		✓			`-`
✓ `shift_left`		(✓)			`<<`, integer only
✓ `shift_right`		(✓)			`>>`, integer only
✓ `xor`	✓				`^`

`uniq`

Returns a new array with distinct elements.

`tally` and `value_counts`

Compute counts of unique elements and return a Hash.

It returns almost same result as Ruby's tally. These methods consider NaNs are same.

array = [0.0/0, Float::NAN]
array.tally #=> {NaN=>1, NaN=>1}

vector = Vector.new(array)
vector.tally #=> {NaN=>2}
vector.value_counts #=> {NaN=>2}

`index(element)`

Returns index of specified element.

`quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)`

Returns quantiles for specified probabilities in a DataFrame.

`sort_indexes`, `sort_indices`, `array_sort_indices`

Coerce

vector = Vector.new(1,2,3)
# => 
#<RedAmber::Vector(:uint8, size=3):0x00000000000decc4>            
[1, 2, 3]                                                         

# Vector's `#*` method
vector * -1
# =>
#<RedAmber::Vector(:int16, size=3):0x00000000000e3698>            
[-1, -2, -3]                                                      

# coerced calculation
-1 * vector
# => 
#<RedAmber::Vector(:int16, size=3):0x00000000000ea4ac>            
[-1, -2, -3]

# `@-` operator
-vector
# =>
#<RedAmber::Vector(:uint8, size=3):0x00000000000ee7b4>
[255, 254, 253]

Update vector's value

`replace(specifier, replacer)` => vector

Accepts Scalar, Range of Integer, Vector, Array, Arrow::Array as a specifier
Accepts Scalar, Vector, Array and Arrow::Array as a replacer.
Boolean specifiers specify the position of replacer in true.
- If booleans.any is false, no replacement happen and return self.
Index specifiers specify the position of replacer in indices.
replacer specifies the values to be replaced.
- The number of true in booleans must be equal to the length of replacer

vector = Vector.new([1, 2, 3])
booleans = [true, false, true]
replacer = [4, 5]
vector.replace(booleans, replacer)
# => 
#<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
[4, 2, 5]

Scalar value in replacer can be broadcasted.

replacer = 0
vector.replace(booleans, replacer)
# => 
#<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
[0, 2, 0]

Returned data type is automatically up-casted by replacer.

replacer = 1.0
vector.replace(booleans, replacer)
# => 
#<RedAmber::Vector(:double, size=3):0x0000000000025d78>
[1.0, 2.0, 1.0]

Position of nil in booleans is replaced with nil.

booleans = [true, false, nil]
replacer = -1
vector.replace(booleans, replacer)
=> 
#<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
[-1, 2, nil]

replacer can have nil in it.

booleans = [true, false, true]
replacer = [nil]
vector.replace(booleans, replacer)
=> 
#<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
[nil, 2, nil]

An example to replace 'NA' to nil.

vector = Vector.new(['A', 'B', 'NA'])
vector.replace(vector == 'NA', nil)
# =>
#<RedAmber::Vector(:string, size=3):0x000000000000f8ac>
["A", "B", nil]

Specifier in indices.

Specified indices are used 'as sorted'. Position in indices and replacer may not have correspondence.

vector = Vector.new([1, 2, 3])
indices = [2, 1]
replacer = [4, 5]
vector.replace(indices, replacer)
# =>
#<RedAmber::Vector(:uint8, size=3):0x000000000000f244>
[1, 4, 5] # not [1, 5, 4]

`fill_nil_forward`, `fill_nil_backward` => vector

Propagate the last valid observation forward (or backward). Or preserve nil if all previous values are nil or at the end.

integer = Vector.new([0, 1, nil, 3, nil])
integer.fill_nil_forward
# =>
#<RedAmber::Vector(:uint8, size=5):0x000000000000f960>
[0, 1, 1, 3, 3]

integer.fill_nil_backward
# =>
#<RedAmber::Vector(:uint8, size=5):0x000000000000f974>
[0, 1, 3, 3, nil]

`boolean_vector.if_else(true_choice, false_choice)` => vector

Choose values based on self. Self must be a boolean Vector.

true_choice, false_choice must be of the same type scalar / array / Vector. nil values in cond will be promoted to the output.

This example will normalize negative indices to positive ones.

indices = Vector.new([1, -1, 3, -4])
array_size = 10
normalized_indices = (indices < 0).if_else(indices + array_size, indices)

# =>
#<RedAmber::Vector(:int16, size=4):0x000000000000f85c>
[1, 9, 3, 6]

`is_in(values)` => boolean vector

For each element in self, return true if it is found in given values, false otherwise. By default, nulls are matched against the value set. (This will be changed in SetLookupOptions: not impremented.)

vector = Vector.new %W[A B C D]
values = ['A', 'C', 'X']
vector.is_in(values)

# =>
#<RedAmber::Vector(:boolean, size=4):0x000000000000f2a8>
[true, false, true, false]

values are casted to the same Class of Vector.

vector = Vector.new([1, 2, 255])
vector.is_in(1, -1)

# =>
#<RedAmber::Vector(:boolean, size=3):0x000000000000f320>
[true, false, true]

`shift(amount = 1, fill: nil)`

Shift vector's values by specified amount. Shifted space is filled by value fill.

vector = Vector.new([1, 2, 3, 4, 5])
vector.shift

# =>
#<RedAmber::Vector(:uint8, size=5):0x00000000000072d8>  
[nil, 1, 2, 3, 4]

vector.shift(-2)

# =>
#<RedAmber::Vector(:uint8, size=5):0x0000000000009970>  
[3, 4, 5, nil, nil]

vector.shift(fill: Float::NAN)

# =>
#<RedAmber::Vector(:double, size=5):0x0000000000011d3c>                    
[NaN, 1.0, 2.0, 3.0, 4.0]

`split_to_columns(sep = ' ', limit = 0)`

Split string type Vector with any ASCII whitespace as separator. Returns an Array of Vectors.

vector = Vector.new(['a b', 'c d', 'e f'])
vector.split_to_columns

#=> 
[#<RedAmber::Vector(:string, size=3):0x00000000000363a8>                                
["a", "c", "e"]                                    
,                                                  
 #<RedAmber::Vector(:string, size=3):0x00000000000363bc>
["b", "d", "f"]                                    
]

It will be used for column splitting in DataFrame.

df = DataFrame.new(year_month: %w[2022-01 2022-02 2022-03])
  .assign(:year, :month) { year_month.split_to_columns('-') }
  .drop(:year_month)

#=>
#<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f974>
  year     month
  <string> <string>
0 2022     01
1 2022     02
2 2022     03

`split_to_rows(sep = ' ', limit = 0)`

Split string type Vector with any ASCII whitespace as separator. Returns an flattend into rows by Vector.

vector = Vector.new(['a b', 'c d', 'e f'])
vector.split_to_rows

#=>
#<RedAmber::Vector(:string, size=6):0x000000000002ccf4>
["a", "b", "c", "d", "e", "f"]

`merge(other, sep: ' ')`

Merge String or other string Vector to self using aseparator. Self must be a string Vector. Returns merged string Vector.

# with vector
vector = Vector.new(%w[a c e])
other = Vector.new(%w[b d f])
vector.merge(other)

#=>
#<RedAmber::Vector(:string, size=3):0x0000000000038b80>
["a b", "c d", "e f"]

If other is a String it will be broadcasted.

# with vector
vector = Vector.new(%w[a c e])

#=>
#<RedAmber::Vector(:string, size=3):0x00000000000446b0>
["a x", "c x", "e x"]

You can specify separator string by :sep.

# with vector
vector = Vector.new(%w[a c e])
other = Vector.new(%w[b d f])
vector.merge(other, sep: '')

#=>
#<RedAmber::Vector(:string, size=3):0x0000000000038b80>
["ab", "cd", "ef"]

`concatenate(other)` or `concat(other)`

Concatenate other array-like to self and return a concatenated Vector.

other is one of Vector, Array, Arrow::Array or Arrow::ChunkedArray
Different type will be 'resolved'.

Concatenate to string

string_vector

# =>
#<RedAmber::Vector(:string, size=2):0x00000000000037b4>
["A", "B"]

string_vector.concatenate([1, 2])

# =>
#<RedAmber::Vector(:string, size=4):0x0000000000003818>
["A", "B", "1", "2"]

Concatenate to integer

integer_vector

# =>
#<RedAmber::Vector(:uint8, size=2):0x000000000000382c>
[1, 2]

nteger_vector.concatenate(["A", "B"])
# =>
#<RedAmber::Vector(:uint8, size=4):0x0000000000003840>
[1, 2, 65, 66]

`rank`

Returns numerical rank of self.

Nil values are considered greater than any value.
NaN values are considered greater than any value but smaller than nil values.
Tiebreakers are ranked in order of appearance.
RankOptions in C++ function is not implemented in C GLib yet. This method is currently fixed to the default behavior.

Returns 0-based rank of self (0...size in range) as a Vector.

Rank of float Vector

fv = Vector.new(0.1, nil, Float::NAN, 0.2, 0.1); fv
# =>
#<RedAmber::Vector(:double, size=5):0x000000000000c65c>
[0.1, nil, NaN, 0.2, 0.1]

fv.rank
# =>
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
[0, 4, 3, 2, 1]

Rank of string Vector

sv = Vector.new("A", "B", nil, "A", "C"); sv
# =>
#<RedAmber::Vector(:string, size=5):0x0000000000003854>
["A", "B", nil, "A", "C"]

sv.rank
# =>
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
[0, 2, 4, 1, 3]

`sample(integer_or_proportion)`

Pick up elements at random.

`sample` : without agrument

Return a randomly selected element. This is one of an aggregation function.

v = Vector.new('A'..'H'); v
# =>
#<RedAmber::Vector(:string, size=8):0x0000000000011b20>
["A", "B", "C", "D", "E", "F", "G", "H"]

v.sample
# =>
"C"

`sample(n)` : n as a Integer

Pick up n elements at random.

Param n is number of elements to pick.
n is a positive Integer
If n is smaller or equal to size, elements are picked by non-repeating.
If n is greater than size, elements are picked repeatedly. @return [Vector] sampled elements.
If n == 1 (in case of sample(1)), it returns a Vector of size == 1 not a scalar.

v.sample(1)
# =>
#<RedAmber::Vector(:string, size=1):0x000000000001a3b0>
["H"]

Sample same size of self: every element is picked in random order.

v.sample(8)
# =>
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
["H", "D", "B", "F", "E", "A", "G", "C"]

Over sampling: "E" and "A" are sampled repeatedly.

v.sample(9)
# =>
#<RedAmber::Vector(:string, size=9):0x000000000001d790>
["E", "E", "A", "D", "H", "C", "A", "F", "H"]

`sample(prop)` : prop as a Float

Pick up elements by proportion prop at random.

prop is proportion of elements to pick.
prop is a positive Float.
Absolute number of elements to pick:prop*size is rounded (by half: :up).
If prop is smaller or equal to 1.0, elements are picked by non-repeating.
If prop is greater than 1.0, some elements are picked repeatedly.
Returns sampled elements by a Vector.
If picked element is only one, it returns a Vector of size == 1 not a scalar.

Sample same size of self: every element is picked in random order.

v.sample(1.0)
# =>
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
["D", "H", "F", "C", "A", "B", "E", "G"]

2 times over sampling.

v.sample(2.0)
# =>
#<RedAmber::Vector(:string, size=16):0x00000000000233e8>
["H", "B", "C", "B", "C", "A", "F", "A", "E", "C", "H", "F", "F", "A", ... ]

`sort(integer_or_proportion)`

Arrange values in Vector.

:+, :ascending or without argument will sort in increasing order.
:- or :descending will sort in decreasing order.

Vector.new(%w[B D A E C]).sort
# same as #sort(:+)
# same as #sort(:ascending)
# =>
#<RedAmber::Vector(:string, size=5):0x000000000000c134>
["A", "B", "C", "D", "E"]

Vector.new(%w[B D A E C]).sort(:-)
# same as #sort(:descending)
# =>
#<RedAmber::Vector(:string, size=5):0x000000000000c148>
["E", "D", "C", "B", "A"]

Files

Vector.md

Latest commit

History

Vector.md

File metadata and controls

Vector

Constructor

Create from a column in a DataFrame

New from an Array

Properties

to_s

values, to_a, entries

indices, indexes, indeces

to_ary

size, length, n_rows, nrow

empty?

type

boolean?, numeric?, string?, temporal?

type_class

each, map, collect

n_nils, n_nans

has_nil?

inspect(limit: 80)

Selecting Values

take(indices), [](indices)

filter(booleans), select(booleans), [](booleans)

Functions

Unary aggregations: vector.func => scalar

Check if function is an aggregation function: Vector.aggregate?(function)

Treat aggregation function as an element-wise function: propagate(function)

Unary element-wise: vector.func => vector

Binary element-wise: vector.func(vector) => vector

uniq

tally and value_counts

index(element)

quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)

sort_indexes, sort_indices, array_sort_indices