Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexplained IndexOutOfRangeException #25

Open
normanhh3 opened this issue Apr 29, 2015 · 10 comments
Open

Unexplained IndexOutOfRangeException #25

normanhh3 opened this issue Apr 29, 2015 · 10 comments
Assignees

Comments

@normanhh3
Copy link

Hi Seth,

Had an idea to do a REALLY simple attempt to learn a function that I would have ordinarily implemented as a switch statement, just for the mind-bending. :-)

The code was written to be run in LinqPad.

void Main()
{
    Assembly.GetAssembly(typeof(Learner)).Dump();

    var gen = new numl.Supervised.NeuralNetwork.NeuralNetworkGenerator();
    gen.Descriptor = Descriptor.Create<WindDirection>();

    var learned = Learner.Learn(WindDirection.TrainingData(), 16/20, 1, gen);

    var model = learned.Model;
    var accuracy = learned.Accuracy.Dump();

    var windDir = new WindDirection(350, null);
    model.Predict(windDir); //Uncomment this if you are running this in LinqPad .Dump("Prediction");
}

// Define other methods and classes here
public class WindDirection {
    [Feature]
    public double Degrees { get; set; }

    [StringLabel()]
    public String Direction { get; set; }

    public WindDirection(double degrees, string direction)
    {
        this.Degrees = degrees;
        this.Direction = direction;
    }

    public static WindDirection[] TrainingData()
    {
        return new[] {
            // Training Values
            new WindDirection(0,     "N"  ),
            new WindDirection(22.5,  "NNE"),
            new WindDirection(45,    "NE" ),
            new WindDirection(67.5,  "ENE"),
            new WindDirection(90,    "E"  ),
            new WindDirection(112.5, "ESE"),
            new WindDirection(135,   "SE" ),
            new WindDirection(157.5, "SSE"),
            new WindDirection(180,   "S"  ),
            new WindDirection(202.5, "SSW"),
            new WindDirection(225,   "SW" ),
            new WindDirection(247.5, "WSW"),
            new WindDirection(270,   "W"  ),
            new WindDirection(292.5, "WNW"),
            new WindDirection(315,   "NW" ),
            new WindDirection(337.5, "NNW"),

            // Testing Values
            new WindDirection(22.5, "NNE"),
            new WindDirection(112.5, "ESE"),
            new WindDirection(11.25, "N"),
            new WindDirection(359-11.25, "N")
        };
    }
}

However, running the above Main function results in the following IndexOutOfRangeException.

   at numl.Model.StringProperty.Convert(Double val) in c:\projects\numl\numl\Model\StringProperty.cs:line 109
   at numl.Learner.GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable`1 examples, Double trainingPct) in c:\projects\numl\numl\Learner.cs:line 169
   at numl.Learner.<>c__DisplayClasse.<Learn>b__d(Int32 i) in c:\projects\numl\numl\Learner.cs:line 110
   at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass11.<ExecuteSelfReplicating>b__10(Object param0)

I have looked at the relevant files here and I can't find the cause of the exception at the relevant lines.

I am using the NuGet 0.8.17.0 build when I am getting this exception.

Have I failed to follow the documentation correctly?

Thoughts?

@sethjuarez
Copy link
Owner

What you are doing looks perfect. I suspect this is related to #24

@normanhh3
Copy link
Author

So, I tried out the changes. Bad news. :-( Looks like this is still occurring.

After debugging the code in _Learner.cs_ in the private static LearningModel GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable<object> examples, double trainingPct, int total) method I found out that when it was attempting to validate the model, I would get a NaN value from the call to _model.Predict(features);_.

                // make prediction
                var features = descriptor.Convert(o, false).ToVector();

                var p = model.Predict(features);
                var pred = descriptor.Label.Convert(p);

The NaN value being passed into the Convert method is causing the IndexOutOfRange exception.

In the _StringProperty.cs_ file in the Convert method, the AsEnum is set to true and then it looks for the value in the dictionary and croaks.

            if (AsEnum)
                return Dictionary[(int)val];
            else
                return val.ToString();

That is as far as I have time for tonight. I'm going to update my sample code above with what I did to create this issue. I expanded my example slightly.

@sethjuarez
Copy link
Owner

It might also be the case that the string in question has never been seen by the classifier (this would be a problem).

@bdschrisk
Copy link
Collaborator

Issue reopened. Proposed solution is to use a weighted feature hashing / extraction algorithm for string types to resolve this issue.

@normanhh3
Copy link
Author

Hey @bdschrisk I just pulled master and tried it out. Looks like the NeuralNetworkGenerator winds up returning NaN values that can't be converted into an appropriate index entry. I also tried out the PerceptronGenerator and wound up with values that were WAY outside the range of possible values for the Dictionary. Since this is happening in the GenerateModel call stack, we should just trap the exception (or check for failure by extending the StringProperty) and use that value to correctly determine if the model is predicting correctly. If the values returned from the model are so off they cannot be converted back into appropriate labels, isn't that an indication of poor model fit to the data?

@bdschrisk
Copy link
Collaborator

Thanks @normanhh3, we plan to add a default value on the Property object to cover this scenario.
Without going into detail, yes, it would indicate poor performance, but the current way doesn't really allow unknown values whereas a feature hashing method would allow strings to be instance agnostic.

@normanhh3
Copy link
Author

Sounds like a good solution then.

@sethjuarez
Copy link
Owner

Shall we close this?

@bdschrisk
Copy link
Collaborator

If we implement stratification in the labels at training time we can - that will resolve the issue for the most part. Once feature hashing is added in that will resolve any further issues down the track.

@sethjuarez
Copy link
Owner

sethjuarez commented Oct 19, 2016

Perhaps an example of what you mean? In the DT for example I have a model default Hint value that represents what the model should select if it gets into a confused state (it just returns the Hint). Should we codify this into the generic IModel/Model class so this issue is resolved across all models? Or are you referring to something else?

@bdschrisk bdschrisk removed the feature label Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants