From 7d8e4b125bae46363fe6af3709166b74cfec3cf7 Mon Sep 17 00:00:00 2001 From: Kevin Gurney Date: Thu, 17 Aug 2023 11:11:06 -0400 Subject: [PATCH] GH-37210: [Docs][MATLAB] Update MATLAB `README.md` to mention support for new MATLAB APIs (e.g. `RecordBatch`, `Field`, `Schema`, etc.) (#37215) ### Rationale for this change Over the last few months, a number of new user-facing APIs have been added or changed in the MATLAB Interface. ## Examples: ### Construction Functions - `arrow.array` - `arrow.field` - `arrow.schema` - `arrow.recordbatch` - `arrow.boolean`, `arrow.string`, `arrow.timestamp`, etc. (type construction functions) ### Classes - `arrow.type.Field` - `arrow.tabular.Schema` - `arrow.tabular.RecordBatch` - `arrow.array.StringArray` - `arrow.array.TimestampArray` ### Indexing Methods - `arrow.tabular.Schema.field(i)` - `arrow.tabular.RecordBatch.column(i)` ### Static Construction Methods - `arrow.tabular.RecordBatch.fromArrays(a1, ..., aN)` - `arrow.array.StringArray.fromMATLAB(a)` --- This pull request updates the [`README.md` for the MATLAB Interface](https://github.com/apache/arrow/blob/main/matlab/README.md) to reflect these changes. ### What changes are included in this PR? Updated MATLAB `README.md`: 1. Updated **Status** section to mention support for new Arrow types and functionality. 2. Added usage examples for `Type, `Field`, `Schema`, and `RecordBatch`. 3. Updated Feather v1 usage examples to illustrate support for types like `String` and `Boolean`. ### Are these changes tested? N/A. This is purely a documentation change. I manually reviewed the visual rendering of the `README.md` Markdown on GitHub. ### Are there any user-facing changes? Yes. 1. The MATLAB `README.md` now includes more usage examples which illustrate how to use the MATLAB interface. ### Future Directions 1. Add comprehensive documentation for all classes and methods, including supported input argument types and name-value pairs. 2. Add more detailed information on future development plans and development status to the `README.md`. 3. Add a table of contents to the MATLAB `README.md`. 4. Keep the `README.md` up to date more proactively. ### Notes 1. Thank you @ sgilmore10 for your help with this pull request! 2. My apologies in advance if this is too much documentation content to review at once. We will focus on being more incremental about keeping the MATLAB documentation up to date in the future. 3. #37211 also involves updating the MATLAB `README.md`, so there is a chance we may need to rebase before merging these changes in. * Closes: #37210 Lead-authored-by: Kevin Gurney Co-authored-by: Kevin Gurney Co-authored-by: Sutou Kouhei Co-authored-by: Sarah Gilmore Signed-off-by: Kevin Gurney --- matlab/README.md | 432 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 417 insertions(+), 15 deletions(-) diff --git a/matlab/README.md b/matlab/README.md index 1e8565ece4e8c..a070984a9e38f 100644 --- a/matlab/README.md +++ b/matlab/README.md @@ -27,17 +27,17 @@ This is a very early stage MATLAB interface to the Apache Arrow C++ libraries. Currently, the MATLAB interface supports: -1. Creating a subset of Arrow `Array` types (e.g. numeric and boolean) from MATLAB data -2. Reading and writing numeric types from/to Feather v1 files. +1. Converting between a subset of Arrow `Array` types and MATLAB array types (see table below) +2. Converting between MATLAB `table`s and `arrow.tabular.RecordBatch`s +3. Creating Arrow `Field`s, `Schema`s, and `Type`s +4. Reading and writing Feather V1 files Supported `arrow.array.Array` types are included in the table below. -**NOTE**: All Arrow `Array` classes are part of the `arrow.array` package (e.g. `arrow.array.Float64Array`). +**NOTE**: All Arrow `Array` classes listed below are part of the `arrow.array` package (e.g. `arrow.array.Float64Array`). | MATLAB Array Type | Arrow Array Type | | ----------------- | ---------------- | -| `single` | `Float32Array` | -| `double` | `Float64Array` | | `uint8` | `UInt8Array` | | `uint16` | `UInt16Array` | | `uint32` | `UInt32Array` | @@ -46,7 +46,11 @@ Supported `arrow.array.Array` types are included in the table below. | `int16` | `Int16Array` | | `int32` | `Int32Array` | | `int64` | `Int64Array` | +| `single` | `Float32Array` | +| `double` | `Float64Array` | | `logical` | `BooleanArray` | +| `string` | `StringArray` | +| `datetime` | `TimestampArray` | ## Prerequisites @@ -134,7 +138,7 @@ matlabArray = 1 2 3 ->> arrowArray = arrow.array.Float64Array(matlabArray) +>> arrowArray = arrow.array(matlabArray) arrowArray = @@ -148,7 +152,7 @@ arrowArray = #### Create a MATLAB `logical` array from an Arrow `BooleanArray` ```matlab ->> arrowArray = arrow.array.BooleanArray([true, false, true]) +>> arrowArray = arrow.array([true, false, true]) arrowArray = @@ -190,7 +194,7 @@ validElements = 1 0 1 0 1 % Specify which values are Null/Valid by supplying a logical validity "mask" ->> arrowArray = arrow.array.Int8Array(matlabArray, Valid=validElements) +>> arrowArray = arrow.array(matlabArray, Valid=validElements) arrowArray = @@ -203,20 +207,418 @@ arrowArray = ] ``` +### Arrow `RecordBatch` class + +#### Create an Arrow `RecordBatch` from a MATLAB `table` + +```matlab +>> matlabTable = table(["A"; "B"; "C"], [1; 2; 3], [true; false; true]) + +matlabTable = + + 3x3 table + + Var1 Var2 Var3 + ____ ____ _____ + + "A" 1 true + "B" 2 false + "C" 3 true + +>> arrowRecordBatch = arrow.recordbatch(matlabTable) + +arrowRecordBatch = + +Var1: [ + "A", + "B", + "C" + ] +Var2: [ + 1, + 2, + 3 + ] +Var3: [ + true, + false, + true + ] +``` + +#### Create a MATLAB `table` from an Arrow `RecordBatch` + +```matlab +>> arrowRecordBatch + +arrowRecordBatch = + +Var1: [ + "A", + "B", + "C" + ] +Var2: [ + 1, + 2, + 3 + ] +Var3: [ + true, + false, + true + ] + +>> matlabTable = table(arrowRecordBatch) + +matlabTable = + + 3x3 table + + Var1 Var2 Var3 + ____ ____ _____ + + "A" 1 true + "B" 2 false + "C" 3 true +``` + +#### Create an Arrow `RecordBatch` from multiple Arrow `Array`s + + +```matlab +>> stringArray = arrow.array(["A", "B", "C"]) + +stringArray = + +[ + "A", + "B", + "C" +] + +>> timestampArray = arrow.array([datetime(1997, 01, 01), datetime(1998, 01, 01), datetime(1999, 01, 01)]) + +timestampArray = + +[ + 1997-01-01 00:00:00.000000, + 1998-01-01 00:00:00.000000, + 1999-01-01 00:00:00.000000 +] + +>> booleanArray = arrow.array([true, false, true]) + +booleanArray = + +[ + true, + false, + true +] + +>> arrowRecordBatch = arrow.tabular.RecordBatch.fromArrays(stringArray, timestampArray, booleanArray) + +arrowRecordBatch = + +Column1: [ + "A", + "B", + "C" + ] +Column2: [ + 1997-01-01 00:00:00.000000, + 1998-01-01 00:00:00.000000, + 1999-01-01 00:00:00.000000 + ] +Column3: [ + true, + false, + true + ] +``` + +#### Extract a column from a `RecordBatch` by index + +```matlab +>> arrowRecordBatch = arrow.tabular.RecordBatch.fromArrays(stringArray, timestampArray, booleanArray) + +arrowRecordBatch = + +Column1: [ + "A", + "B", + "C" + ] +Column2: [ + 1997-01-01 00:00:00.000000, + 1998-01-01 00:00:00.000000, + 1999-01-01 00:00:00.000000 + ] +Column3: [ + true, + false, + true + ] + +>> timestampArray = arrowRecordBatch.column(2) + +timestampArray = + +[ + 1997-01-01 00:00:00.000000, + 1998-01-01 00:00:00.000000, + 1999-01-01 00:00:00.000000 +] +``` + +### Arrow `Type` classes (i.e. `arrow.type.`) + +#### Create an Arrow `Int8Type` object + +```matlab +>> type = arrow.int8() + +type = + + Int8Type with properties: + + ID: Int8 +``` + +#### Create an Arrow `TimestampType` object with a specific `TimeUnit` and `TimeZone` + +```matlab +>> type = arrow.timestamp(TimeUnit="Second", TimeZone="Asia/Kolkata") + +type = + + TimestampType with properties: + + ID: Timestamp + TimeUnit: Second + TimeZone: "Asia/Kolkata" +``` + + +#### Get the type enumeration `ID` for an Arrow `Type` object + +```matlab +>> type.ID + +ans = + + ID enumeration + + Timestamp + +>> type = arrow.string() + +type = + + StringType with properties: + + ID: String + +>> type.ID + +ans = + + ID enumeration + + String +``` + +### Arrow `Field` class + +#### Create an Arrow `Field` with type `Int8Type` + +```matlab +>> field = arrow.field("Number", arrow.int8()) + +field = + +Number: int8 + +>> field.Name + +ans = + + "Number" + +>> field.Type + +ans = + + Int8Type with properties: + + ID: Int8 + +``` + +#### Create an Arrow `Field` with type `StringType` + +```matlab +>> field = arrow.field("Letter", arrow.string()) + +field = + +Letter: string + +>> field.Name + +ans = + + "Letter" + +>> field.Type + +ans = + + StringType with properties: + + ID: String +``` + +#### Extract an Arrow `Field` from an Arrow `Schema` by index + +```matlab +>> arrowSchema + +arrowSchema = + +Letter: string +Number: double + +% Specify the field to extract by its index (i.e. 2) +>> field = arrowSchema.field(2) + +field = + +Number: double +``` + +#### Extract an Arrow `Field` from an Arrow `Schema` by name + +```matlab +>> arrowSchema + +arrowSchema = + +Letter: string +Number: double + +% Specify the field to extract by its name (i.e. "Letter") +>> field = arrowSchema.field("Letter") + +field = + +Letter: string +``` + +### Arrow `Schema` class + +#### Create an Arrow `Schema` from multiple Arrow `Field`s + +```matlab +>> letter = arrow.field("Letter", arrow.string()) + +letter = + +Letter: string + +>> number = arrow.field("Number", arrow.int8()) + +number = + +Number: int8 + +>> schema = arrow.schema([letter, number]) + +schema = + +Letter: string +Number: int8 +``` + +#### Get the `Schema` of an Arrow `RecordBatch` + +```matlab +>> matlabTable = table(["A"; "B"; "C"], [1; 2; 3], VariableNames=["Letter", "Number"]) + +matlabTable = + + 3x2 table + + Letter Number + ______ ______ + + "A" 1 + "B" 2 + "C" 3 + +>> arrowRecordBatch = arrow.recordbatch(matlabTable) + +arrowRecordBatch = + +Letter: [ + "A", + "B", + "C" + ] +Number: [ + 1, + 2, + 3 + ] + +>> arrowSchema = arrowRecordBatch.Schema + +arrowSchema = + +Letter: string +Number: double +``` + ### Feather V1 -#### Write a MATLAB table to a Feather v1 file +#### Write a MATLAB table to a Feather V1 file ``` matlab ->> t = array2table(rand(10, 10)); ->> filename = 'table.feather'; ->> featherwrite(filename,t); +>> t = table(["A"; "B"; "C"], [1; 2; 3], [true; false; true]) + +t = + + 3×3 table + + Var1 Var2 Var3 + ____ ____ _____ + + "A" 1 true + "B" 2 false + "C" 3 true + +>> filename = "table.feather"; + +>> featherwrite(filename, t) ``` -#### Read a Feather v1 file into a MATLAB table +#### Read a Feather V1 file into a MATLAB table ``` matlab ->> filename = 'table.feather'; ->> t = featherread(filename); +>> filename = "table.feather"; + +>> t = featherread(filename) + +t = + + 3×3 table + + Var1 Var2 Var3 + ____ ____ _____ + + "A" 1 true + "B" 2 false + "C" 3 true ```