Skip to content

Latest commit

Β 

History

History
805 lines (663 loc) Β· 29.8 KB

V004_DOCS.md

File metadata and controls

805 lines (663 loc) Β· 29.8 KB

v0.0.4 Documentation

As at v0.0.4, the Dan::Polars synopsis has been extended in multiple ways. This page is a vestigal version of the Dan::Polars documentation. For now it includes only features that are not covered in the Dan synopsis or the Dan::Polars synopsis.

Over time, the synopsis items will be added here in more detail.

This Documentation should be read in conjunction with the Polars Book. The content is largely example based and can be read alongside the Python and Rust examples given there.

TOC

The TOC is a subset of the Polars Book TOC.

Concepts

Contexts

Select

my \df1 = DataFrame.new(['Ray Type' => ["Ξ±", "Ξ²", "X", "Ξ³"]]);
df1.show;

shape: (4, 1)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Ray Type β”‚
β”‚ ---      β”‚
β”‚ str      β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•‘
β”‚ Ξ±        β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ Ξ²        β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ X        β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ Ξ³        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
my \df2 = df1.drop(['Ray Type']);
df2.show;

shape: (0, 0)
β”Œβ”
β•žβ•‘
β””β”˜

say df2.is_empty; #True

Expressions

Casting

Numerics

my \df = DataFrame.new([
    integers            => [1, 2, 3, 4, 5],
    big_integers        => [1, 10000002, 3, 10000004, 4294967297],
    floats              => [4.0, 5.0, 6.0, 7.0, 8.0],
    floats_with_decimal => [4.532, 5.5, 6.5, 7.5, 8.5],
]);
df.show;

df.select([
    col("integers").cast("f32").alias("integers_as_floats"),
    col("floats").cast("i32").alias("floats_as_integers"),
    col("floats_with_decimal").cast("i32").alias("floats_with_decimal_as_integers"),
]).show;

Strings

my \dfs = DataFrame.new([
    integers         => [1, 2, 3, 4, 5],
    floats           => [4.0, 5.03, 6.0, 7.0, 8.0],
    strings          => <4.0 5.0 6.0 7.0 8.0>>>.Str.Array,
]);
dfs.show;

dfs.select([
    col("integers").cast("str"),
    col("floats").cast("str"),
    col("strings").cast("f32"),
]).show;

Booleans

my \dfs = DataFrame.new([
    integers => [-1, 0, 2, 3, 4],
    floats => [0.0, 1.0, 2.0, 3.0, 4.0],
    bools => [True, False, True, False, True],
]);
dfs.show;

dfs.select([
    col("integers").cast("bool"),
    col("floats").cast("bool"),
    col("bools").cast("i32"),
]).show;

Aggregation

Conditionals

my \df = DataFrame.new([
    nrs    => [1, 2, 3, 4, 5], 
    nrs2   => [2, 3, 4, 5, 6], 
    names  => ["foo", "ham", "spam", "egg", ""],
    random => [1.rand xx 5], 
    groups => ["A", "A", "B", "C", "B"],
]);
df.show;

#viz. https://pola-rs.github.io/polars-book/user-guide/expressions/operators/#logical
#(gt >, lt <, ge >=, le <=, eq ==, ne !=, and &&, or ||)
df.select([(col("nrs") > 2).alias("jones")]).head;
#df.select([(col("nrs") >= 2).alias("jones")]).head;
#df.select([(col("nrs") < 2).alias("jones")]).head;
#df.select([(col("nrs") <= 2).alias("jones")]).head;
#df.select([(col("nrs") == 2).alias("jones")]).head;
#df.select([(col("nrs") != 2).alias("jones")]).head;
#df.select([((col("nrs") >= 2) && (col("nrs2") == 5)) .alias("jones")]).head;
#df.select([((col("nrs") >= 2) || (col("nrs2") == 5)) .alias("jones")]).head;

Filter

The filter method applies to the entire DataFrame.

df.filter([(col("nrs") != 4)]).show;
shape: (4, 5)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ nrs ┆ nrs2 ┆ names ┆ random   ┆ groups β”‚
β”‚ --- ┆ ---  ┆ ---   ┆ ---      ┆ ---    β”‚
β”‚ i32 ┆ i32  ┆ str   ┆ f64      ┆ str    β”‚
β•žβ•β•β•β•β•β•ͺ══════β•ͺ═══════β•ͺ══════════β•ͺ════════║
β”‚ 1   ┆ 2    ┆ foo   ┆ 0.568035 ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2   ┆ 3    ┆ ham   ┆ 0.4602   ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 3   ┆ 4    ┆ spam  ┆ 0.647715 ┆ B      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 5   ┆ 6    ┆       ┆ 0.991221 ┆ B      β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Unlike .filter, DataFrame .grep is implemented by converting a rust Dan::Polars::DataFrame to a raku Dan::DataFrame (a .flood), performing the grep with a raku block-style syntax and then convering back (a .flush). The implication is that the syntax is very rich, but the performance is lower than Expression Sorting.

# Grep (binary filter)
say ~df.grep( { .[1] < 0.5 } );                                # by 2nd column 
say ~df.grep( { df.ix[$++] eq <2022-01-02 2022-01-06>.any } ); # by index (multiple) 

Sort

DataFrame Sort

Specify an Array[Str] of column names and an Array[Bool] of descending? options:

df.sort(["groups","names"],[False, True]).show;
shape: (5, 5)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ nrs ┆ nrs2 ┆ names ┆ random   ┆ groups β”‚
β”‚ --- ┆ ---  ┆ ---   ┆ ---      ┆ ---    β”‚
β”‚ i32 ┆ i32  ┆ str   ┆ f64      ┆ str    β”‚
β•žβ•β•β•β•β•β•ͺ══════β•ͺ═══════β•ͺ══════════β•ͺ════════║
β”‚ 2   ┆ 3    ┆ ham   ┆ 0.651383 ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 1   ┆ 2    ┆ foo   ┆ 0.687945 ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 3   ┆ 4    ┆ spam  ┆ 0.020684 ┆ B      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 5   ┆ 6    ┆       ┆ 0.961176 ┆ B      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 4   ┆ 5    ┆ egg   ┆ 0.666724 ┆ C      β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Or, if you prefer a more raku-oriented style, specify a Block:

df.sort( {df[$++]<random>} )[*].reverse^.show;
shape: (5, 5)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ nrs ┆ nrs2 ┆ names ┆ random   ┆ groups β”‚
β”‚ --- ┆ ---  ┆ ---   ┆ ---      ┆ ---    β”‚
β”‚ i32 ┆ i32  ┆ str   ┆ f64      ┆ str    β”‚
β•žβ•β•β•β•β•β•ͺ══════β•ͺ═══════β•ͺ══════════β•ͺ════════║
β”‚ 5   ┆ 6    ┆       ┆ 0.961176 ┆ B      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 1   ┆ 2    ┆ foo   ┆ 0.687945 ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 4   ┆ 5    ┆ egg   ┆ 0.666724 ┆ C      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2   ┆ 3    ┆ ham   ┆ 0.651383 ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 3   ┆ 4    ┆ spam  ┆ 0.020684 ┆ B      β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

As set out in the Dan synopsis, DataFrame level sort is done like this:

# Sort
say ~df.sort: { .[1] };         # sort by 2nd col (ascending)
say ~df.sort: { -.[1] };        # sort by 2nd col (descending)
say ~df.sort: { df[$++]<C> };   # sort by col C
say ~df.sort: { df.ix[$++] };   # sort by index

Here is another example from the Dan::Polars Nutshell:

$obj .= sort( {$obj[$++]<species>, $obj[$++]<mass>} )[*].reverse^;

Unlike colspec sort, Block sort is implemented by converting a rust Dan::Polars::DataFrame to a raku Dan::DataFrame (ie. .flood), performing the sort with a raku block-style syntax and then convering back (ie. .flush). The implication is that the syntax is very rich, but the performance is lower.

Expression Sort

The sort method on col Expressions in a select is independently applied to each col.

df.select([(col("names").alias("jones").sort),col("groups").alias("smith").sort,col("nrs").reverse]).head;
shape: (5, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ jones ┆ smith ┆ nrs β”‚
β”‚ ---   ┆ ---   ┆ --- β”‚
β”‚ str   ┆ str   ┆ i32 β”‚
β•žβ•β•β•β•β•β•β•β•ͺ═══════β•ͺ═════║
β”‚       ┆ A     ┆ 5   β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ egg   ┆ A     ┆ 4   β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ foo   ┆ B     ┆ 3   β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ ham   ┆ B     ┆ 2   β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ spam  ┆ C     ┆ 1   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

The sort method on col Expressions in a groupby is applied to the list result.

df.groupby(["groups"]).agg([col("nrs").sort]).head;
#df.groupby(["groups"]).agg([col("nrs").reverse]).head;
shape: (3, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ groups ┆ nrs       β”‚
β”‚ ---    ┆ ---       β”‚
β”‚ str    ┆ list[i32] β”‚
β•žβ•β•β•β•β•β•β•β•β•ͺ═══════════║
β”‚ C      ┆ [4]       β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ A      ┆ [1, 2]    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ B      ┆ [3, 5]    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Missing_Data

In Dan::Polars, missing data is represented by the raku Type Object (Int, Bool, Str and so on) or by the raku Numeric special values (NaN, +/-Inf).

my \df = DataFrame.new([
    nrs    => [1, 2, 3, 4, 5], 
    nrs2   => [Num, NaN, 4, Inf, 8.3],
    names  => ["foo", Str, "spam", "egg", ""],
    random => [1.rand xx 5], 
    groups => ["A", "A", "B", "C", "B"],
    flags  => [True,True,False,True,Bool],
]);
df.show;

shape: (5, 6)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ nrs ┆ nrs2 ┆ names ┆ random   ┆ groups ┆ flags β”‚
β”‚ --- ┆ ---  ┆ ---   ┆ ---      ┆ ---    ┆ ---   β”‚
β”‚ i32 ┆ f64  ┆ str   ┆ f64      ┆ str    ┆ bool  β”‚
β•žβ•β•β•β•β•β•ͺ══════β•ͺ═══════β•ͺ══════════β•ͺ════════β•ͺ═══════║
β”‚ 1   ┆ null ┆ foo   ┆ 0.074586 ┆ A      ┆ true  β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2   ┆ NaN  ┆ null  ┆ 0.867919 ┆ A      ┆ true  β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 3   ┆ 4.0  ┆ spam  ┆ 0.069183 ┆ B      ┆ false β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 4   ┆ inf  ┆ egg   ┆ 0.739191 ┆ C      ┆ true  β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 5   ┆ 8.3  ┆       ┆ 0.133729 ┆ B      ┆ null  β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

And, conversely, when cast back to a (non Polars) Dan DataFrame:

say ~df.Dan-DataFrame;

    nrs  nrs2  names  random               groups  flags 
 0  1    Num   foo    0.9188127959571387   A       True  
 1  2    NaN   Str    0.08257029673307026  A       True  
 2  3    4     spam   0.0682447340762582   B       False 
 3  4    Inf   egg    0.3287371781756494   C       True  
 4  5    8.3          0.5133318112263049   B       Bool 

You can test for what you have with:

Sense Truthiness Definedness Numberness Finiteness
so n/a is_null is_not_nan is_finite
not is_not is_not_null is_nan is_infinite
#`[
df.select([(col("nrs") > 2)]).head;
df.select([((col("nrs") > 2).is_not)]).head;
df.select([(col("nrs2").is_null)]).head;
df.select([(col("nrs2").is_not_null)]).head;
df.select([(col("nrs2").is_not_nan)]).head;
df.select([(col("nrs2").is_nan)]).head;
df.select([(col("nrs2").is_finite)]).head;
#]
df.select([(col("nrs2").is_infinite)]).head;

shape: (5, 1)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”
β”‚ nrs2  β”‚
β”‚ ---   β”‚
β”‚ bool  β”‚
β•žβ•β•β•β•β•β•β•β•‘
β”‚ null  β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ false β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ false β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ true  β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ false β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜

Apply

General

In Rust Polars, map and apply functions are offered. In Dan::Polars, only apply is provided for user-defined functions. Per the Polars user guide:

Use cases for map in the group_by context are slim. They are only used for performance reasons, but can quite easily lead to incorrect results...

Luckily, apply works on the smallest logical elements for the operation:

  • select context -> single elements
  • group by context -> single groups

Dan::Polars apply aims to offer near native Rust Polars performance on user-defined operations embedded in raku code. Long term, it is intended to be suitable for concurrent and parallel processing so could be faster than Python Polars. The operation is written in "Rust lambda slang" within your raku code and then it is JIT compiled and made available in a Rust library (libapply.so or equivalent) to be called from the Rust Polars library.

Monadic Apply Operations

Monadic - operations with one argument

Taking this example DataFrame:

my \df = DataFrame.new([
    nrs    => [1, 2, 3, 4, 5],
    nrs2   => [2, 3, 4, 5, 6],
    names  => ["foo", "ham", "spam", "egg", ""],
    random => [1.rand xx 5],
    groups => ["A", "A", "B", "C", "B"],
]);
df.show;

shape: (5, 5)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ nrs ┆ nrs2 ┆ names ┆ random   ┆ groups β”‚
β”‚ --- ┆ ---  ┆ ---   ┆ ---      ┆ ---    β”‚
β”‚ i32 ┆ i32  ┆ str   ┆ f64      ┆ str    β”‚
β•žβ•β•β•β•β•β•ͺ══════β•ͺ═══════β•ͺ══════════β•ͺ════════║
β”‚ 1   ┆ 2    ┆ foo   ┆ 0.455665 ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2   ┆ 3    ┆ ham   ┆ 0.961131 ┆ A      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 3   ┆ 4    ┆ spam  ┆ 0.093231 ┆ B      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 4   ┆ 5    ┆ egg   ┆ 0.570909 ┆ C      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 5   ┆ 6    ┆       ┆ 0.716256 ┆ B      β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Here we add one to each i32 value in a groupby:

df.groupby(["groups"]).agg([col("nrs").apply("|a: i32| (a + 1) as i32").alias("jones")]).head;

shape: (3, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ groups ┆ jones     β”‚
β”‚ ---    ┆ ---       β”‚
β”‚ str    ┆ list[i32] β”‚
β•žβ•β•β•β•β•β•β•β•β•ͺ═══════════║
β”‚ B      ┆ [4, 6]    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ A      ┆ [2, 3]    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ C      ┆ [5]       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The various part of the raku source code with Rust lambda slang are described below:

    # Monadic Real
    # apply() is exported directly into client script and acts on the ExprC made by col()
    # its argument is a string in the form of a Rust lambda with |signature| (body) as rtn-type
    # the lambda takes variable 'a: type' if monadic or 'a: type, b: type' if dyadic' 
    # the body is a valid Rust expression 

#df.select([col("nrs").apply("|a: i32| (a + 1) as i32").alias("jones")]).head;
--- ------  ---------- -----  --------  -----  ------   -----            ----
 |     |        |        |        |       |      |        |                -> method head prints top lines of result
 |     |        |        |        |       |      |        |
 |     |        |        |        |       |      |        -> method alias returns a new Expr
 |     |        |        |        |       |      |
 |     |        |        |        |       |      -> lamda return type (Rust)
 |     |        |        |        |       |
 |     |        |        |        |       -> lambda expression using varname (Rust) 
 |     |        |        |        |
 |     |        |        |        -> lambda signature with varname : type (Rust) 
 |     |        |        |
 |     |        |        -> method apply returns a new Expr with the results of the lambda
 |     |        |
 |     |        -> method col(Str \colname) returns a new (empty) Expr
 |     |
 |     -> method select(Array \exprs) creates a LazyFrame, calls .select(exprs) then .collect
 |
 |
 -> DataFrame object with attributes of pointers to rust DataFrame and LazyFrame structures
Dyadic Apply Operations

Dyadic - operations with two arguments

Taking this example DataFrame:

my \df2 = DataFrame.new([
    keys => ["a", "a", "b"],
    values => [10, 7, 1],
    ovalues => [10, 7, 1],
]);
df2.show;

shape: (3, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ keys ┆ values ┆ ovalues β”‚
β”‚ ---  ┆ ---    ┆ ---     β”‚
β”‚ str  ┆ i32    ┆ i32     β”‚
β•žβ•β•β•β•β•β•β•ͺ════════β•ͺ═════════║
β”‚ a    ┆ 10     ┆ 10      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ a    ┆ 7      ┆ 7       β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ b    ┆ 1      ┆ 1       β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Here we add one to each i32 value in a groupby:

Transformations

In Dan::Polars, the two sections - Join and Concat - are related via these tables:

Table 1: Combining functions for DataFrames
Function Description Dan
join Join on a column df1.join(df2, how=>'inner', on=>'col')
concat Concatenate along an axis df1.concat(df2, axis=>0/1)
Table 2: Combining functions for Series
Function Description Dan
concat Append one Series to another series1.concat( series2 )

The rationale for this solution is set out in Issue #10

Join

Here is the signature of the Dan::Polars .join method:

subset JoinType of Str where <left inner outer cross>.any;
method join( DataFrame \right, Str :$on, JoinType :$how = 'outer' ) { ... }
  • use ```on => 'colname' to pass the column on which to do the join
    • Dan::Polars will guess the on column(s) if nothing is supplied
    • on_right and on_left are not provided
    • ignored if a cross join
  • use ```how => 'jointype' to specify how to do the join
    • default is outer
    • undefined cells are created as null
    • right is not implemented (swap method call if needed)
    • asof and semi are not yet implemented

First some examples:

my \df_customers = DataFrame([
    customer_id => [1, 2, 3], 
    name => ["Alice", "Bob", "Charlie"],
]);
df_customers.show;

shape: (3, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customer_id ┆ name    β”‚
β”‚ ---         ┆ ---     β”‚
β”‚ i32         ┆ str     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════║
β”‚ 1           ┆ Alice   β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2           ┆ Bob     β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 3           ┆ Charlie β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

my \df_orders = DataFrame([
    order_id => ["a", "b", "c"],
    customer_id => [1, 2, 2], 
    amount => [100, 200, 300],
]);
df_orders.show;

shape: (3, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ order_id ┆ customer_id ┆ amount β”‚
β”‚ ---      ┆ ---         ┆ ---    β”‚
β”‚ str      ┆ i32         ┆ i32    β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ════════║
β”‚ a        ┆ 1           ┆ 100    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ b        ┆ 2           ┆ 200    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ c        ┆ 2           ┆ 300    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
df_customers.join(df_orders, on => "customer_id", how => "inner").show;

shape: (3, 4)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customer_id ┆ name  ┆ order_id ┆ amount β”‚
β”‚ ---         ┆ ---   ┆ ---      ┆ ---    β”‚
β”‚ i32         ┆ str   ┆ str      ┆ i32    β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════β•ͺ══════════β•ͺ════════║
β”‚ 1           ┆ Alice ┆ a        ┆ 100    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2           ┆ Bob   ┆ b        ┆ 200    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2           ┆ Bob   ┆ c        ┆ 300    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

df_customers.join(df_orders).show;    #outer join relying on defaults

shape: (4, 4)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customer_id ┆ name    ┆ order_id ┆ amount β”‚
β”‚ ---         ┆ ---     ┆ ---      ┆ ---    β”‚
β”‚ i32         ┆ str     ┆ str      ┆ i32    β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════β•ͺ══════════β•ͺ════════║
β”‚ 1           ┆ Alice   ┆ a        ┆ 100    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2           ┆ Bob     ┆ b        ┆ 200    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 2           ┆ Bob     ┆ c        ┆ 300    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ 3           ┆ Charlie ┆ null     ┆ null   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

df_customers.join(df_orders, on => "customer_id", how => "left").show;
^^ same as above (in this example)

For cross join:

my \df_colors = DataFrame([ 
    color => ["red", "blue", "green"],
]);
df_colors.show;

shape: (3, 1)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”
β”‚ color β”‚
β”‚ ---   β”‚
β”‚ str   β”‚
β•žβ•β•β•β•β•β•β•β•‘
β”‚ red   β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ blue  β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ green β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜

my \df_sizes = DataFrame([
    size => ["S", "M", "L"],
]);
df_sizes.show;

shape: (3, 1)
β”Œβ”€β”€β”€β”€β”€β”€β”
β”‚ size β”‚
β”‚ ---  β”‚
β”‚ str  β”‚
β•žβ•β•β•β•β•β•β•‘
β”‚ S    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ M    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ L    β”‚
β””β”€β”€β”€β”€β”€β”€β”˜
df_colors.join( df_sizes, :how<cross> ).show;

shape: (9, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β”‚ color ┆ size β”‚
β”‚ ---   ┆ ---  β”‚
β”‚ str   ┆ str  β”‚
β•žβ•β•β•β•β•β•β•β•ͺ══════║
β”‚ red   ┆ S    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ red   ┆ M    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ red   ┆ L    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ blue  ┆ S    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ ...   ┆ ...  β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ blue  ┆ L    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ green ┆ S    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ green ┆ M    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ green ┆ L    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

Concat

DataFrames

Here is the signature of the Dan::Polars DataFrame .concat method:

method concat( DataFrame:D $dfr, :ax(:$axis) is copy ) { ... }

given $axis {
    when ! .so || /^r/ || /^v/ { 0 }
    when   .so || /^c/ || /^h/ { 1 }
}
  • ax is an alias for axis
  • default (False) is vertical
  • as values you can use
    • False | True
    • 0 | 1
    • anything with initial char [r]ow or [c]olumn
    • anything with initial char [v]ertical or [h]orizontal

First, some example data:

my \dfa = DataFrame.new(
        [['a', 1], ['b', 2]],
        columns => <letter number>,
);

my \dfb = DataFrame.new(
        [['c', 3], ['d', 4]],
        columns => <letter number>,
);

my \dfc = DataFrame.new(
        [['cat', 4], ['dog', 4]],
        columns => <animal legs>,
);
dfa.concat(dfb).show;               # vertical is default

shape: (4, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ letter ┆ number β”‚
β”‚ ---    ┆ ---    β”‚
β”‚ str    ┆ i32    β”‚
β•žβ•β•β•β•β•β•β•β•β•ͺ════════║
β”‚ a      ┆ 1      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ b      ┆ 2      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ c      ┆ 3      β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ d      ┆ 4      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
dfa.concat(dfc, :axis).show;     # horizontal or column-wise

shape: (2, 4)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β”‚ letter ┆ number ┆ animal ┆ legs β”‚
β”‚ ---    ┆ ---    ┆ ---    ┆ ---  β”‚
β”‚ str    ┆ i32    ┆ str    ┆ i32  β”‚
β•žβ•β•β•β•β•β•β•β•β•ͺ════════β•ͺ════════β•ͺ══════║
β”‚ a      ┆ 1      ┆ cat    ┆ 4    β”‚
β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”Όβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€
β”‚ b      ┆ 2      ┆ dog    ┆ 4    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜
Series
my \s = Series.new( [b=>1, a=>0, c=>2] );
my \t = Series.new( [f=>1, e=>0, d=>2] );

my $u = s.concat: t;                # concatenate
$u.show;

shape: (6,)
Series: 'anon' [i32]
[
	1
	0
	2
	1
	0
	2
]

Copyright(c) 2022-2023 Henley Cloud Consulting Ltd.