DataSets are weird... #55

dktcoding · 2017-01-07T04:21:56Z

kronenthaler · 2017-01-07T13:13:42Z

I know this classes need tons of work. They kind of grew organically from the C4.5 implementation to something else when i was investigating the Bayes Network implementations.

Some comments on some of your points:

i might agree with the rename from DiscreteAttribute to CategoricalAttribute, however, the addition of DiscreteAttribute seems superfluous as it will be a subset of the ContinuousAttribute (just using the integer part). But then it won't be discrete anymore as integers are continuous.
Frequencies for continuous and discrete attributes in intervals it's implemented on the DataSet using getFrequencies(int lo, int hi, int index)
Incomplete/Dirty removal i think it's responsibility of who traverse the data set, for instance C4.5 has ways to deal with them (that i haven't implemented yet). I would be more inclined to have a flag or a special method that do it for you if needed, but certainly not a default option.
The problem with filtering options is that can lead to incredibly complex code. Keep in mind that any library might expect certain kind of inputs, and more often than not, the inputs have to be pre-processed before it can be feed to any library. Because of this, i think filtering should be part of that pre-processing step.

I would rather focus on this later, as this will be part of a bigger architectural change that might affect several other components (C4.5 & Bayes) and i want to assess the scope of the change first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataSets are weird... #55

DataSets are weird... #55

dktcoding commented Jan 7, 2017

kronenthaler commented Jan 7, 2017

DataSets are weird... #55

DataSets are weird... #55

Comments

dktcoding commented Jan 7, 2017

kronenthaler commented Jan 7, 2017