- Nothing for the moment
- Added faster torch.Tensor functions to fill/stat functions for speed
- Added mutate function to Dataseries
__index__
access for Df_Array- More complete documentation for Df_Array and specs
- Df_Dict elements can be accessed using
myDict[index]
ormyDict["$colname"]
- Df_Dict
key
property available. It list the Df_Dict's keys - Df_Dict
length
property available. It list by key, the length of its content - Df_Dict
check_length()
checks if all elements have the same length - Df_Dict
set_keys(table)
replaces every keys by the given table (must be the same size) - More complete documentation for Df_Dict and specs
- More complete documentation for Df_Tbl and specs
- Internal methods
_infer_csvigo_schema()
and_infer_data_schema()
renamed to_infer_schema()
- Type inference is now based on type frequences but if it encounter a single double/float in a integer column it will consider the column as double/float
- it is now possible to directly set a schema for a Dataframe without any checks with
set_schema()
. Use it wisely - Possibility to init a Dataframe with a schema, a column order and a number of rows with internal method
_init_with_schema()
- Added
bulk_load_csv()
method wich loads large CSVs files using threads but without checking missing values or data integrity. To use with caution. See #28 - Added
load_threadcsv()
- Added the possiblity to create empty Dataseries
- Added Dataseries
load()
method to directly load a tensor or tds.Vec in memory without any check - Added iris dataset in
/specs/data
- New specs structure
- Fixed csv loading when no header and test case according to it
- Changed
assert_is_index
return value totrue
on success instead ofself
- The get_max_value/get_min_value use torch.max/min when no missing data is present in the column
- Fixed upgrade_frame bug
- Fixed bug with saving CSV-files when they contain boolean values
- The data is now stored in Dataseries that handles all the manipulations, statistics, categoricals, etc internally. The data backend is either a tensor or a tds.Vec in order to better accomodate large datasets.
- The self.columns has been dropped and there is now only self.column_order that keeps track of column order.
- Most functions now use either tds.Hash or tds.Vec for returning values instead of regular tables.
- The data types are now more sophisticate with boolean, integer, long, double, and string. The first and the last are internally stored as
tds.Vec
while the remaining are in the form of torch tensors. - Since conversions are more restricted with the new column types the is a boolean2tensor and boolean2categorical that help converting boolean columns into numerical.
- The
Dataframe.schema
property has been removed as it now resides in the series. The same information can be retrieved usingget_schema()
. - There is now a custom busted assertion that can compare tensors, tds, and Dataseries.
- The csv data is entered using csvigo's
large
mode thus circumventing the memory limit for large csv's. - The to_/from_categorical now always return a single value when a single value is entered.
- Add column now takes a Dataseries instead of a Df_Array
- Generalized the argcheck by adding string.split for
|
separated arguments - Multiple minor bug-fixes with non-local variables
- Added new subset and batch loading functionality (issue #22)
- Added metatable functionality (issue #18)
- The as_categorical can now receive levels, labels and exclusions (issue #23)
- The unique sorts the results before returning, thereby preventing the order to depend on any irrelevant changes in the original table order. IMPORTANT: This will affect the numbers corresponding to categoricals (part of issue #23)
- Added compatibility with the torchnet infrastructure via inheritance and a custom iterator that allows utilizing the internal permutation logistic (issue #24)
- The Batchframe can now have default data/load options allowing a simpler
to_tensor
call - The Batchframe now supports common transformations that may be required for the label
via the
label_shape
. See the ntwrk_implementation_spec file that contains basic examples. - Added so that
__init
parameters can be passed along the subset line primary for allowing default Batchframe load/data arguments - Insert now takes an index argument allowing insertion of rows. Backward compatibility retained.
- Added append that does the same as index previously did, i.e. adding a row at the bottom of the table
- Added rbind (row-bind) as an append alias
- Added cbind (column-bind)
- Added wide2long for converting wide datasets to long
- Added a version function
- Added upgrade_frame that handles upgrades from previous Dataframe versions
- The statistics can now return dataframe that is also the default (allows nicer printing)
- The add_column can now take a position argument and updates the schema + columns
- The init-constructor for with a table argument now also accepts column_order argument
- The column order can now be specified using the
pos_column_order
and manipulated usingswap_column_order
. - The tostring now has a more advanced printing that aims at total table width instead of just making sure that certain columns didn't end up too wide. The previous Dataframe.print default arguments for printing have been moved to Dataframe.tostring_defaults
- The helper classes Df_Array, Df_Dict and Df_Tbl now have a metatable
__len__
option - The as_batchframe has been renamed to frame_type that defaults to current frame type
- The set now changes all matching values instead of only the first occurrence
- Fixed bug with outputting categorical columns
- Fixed bug related to boolean columns. Note: columns that are created using the csv-option are currently not being converted to boolean columns but will remain as strings with 'true' and 'false'
- Converted all functions to use argcheck
- API-changes due to above
- Full test suite
- Load batch support
- Categorical data support