-
Notifications
You must be signed in to change notification settings - Fork 9
/
README
34 lines (24 loc) · 894 Bytes
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
dataformat 0.1 - a collection of modules for reading and writing
machine learning data sets
The goal of this project is to provide code for reading and writing
machine learning data sets for as many programming languages as
possible. Using these code, it should become much easier to have code
written in different languages speak to each other.
Currently, we are focusing on the ARFF file format
(http://weka.sourceforge.net/wekadoc/index.php/en:ARFF_%283.5.6%29),
developed in the Weka project (http://www.cs.waikato.ac.nz/~ml/).
Currently covered languages are:
- python
- ruby
- matlab
- java
C and C++ are next on the list.
ARFF is covered except for:
- sparse features
- date attributes
- relational attributes
- missing values
- instance weights
Some things might not work as expected, in particular
- strings with commas in them
But we're working on that.