Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 1/0 labels for binary classification instead of 1/-1 #9

Open
benmccann opened this issue May 14, 2016 · 3 comments
Open

Use 1/0 labels for binary classification instead of 1/-1 #9

benmccann opened this issue May 14, 2016 · 3 comments

Comments

@benmccann
Copy link
Contributor

The loss function used in this library for binary classification is a hinge-loss function assuming labels +1 or -1:

case 1 =>
  1 - Math.signum(pred * label)

However, the predictions being made are in the range 0-1:

case 1 =>
  1.0 / (1.0 + Math.exp(-pred))

The 1 / 0 used in predictions should be preferred to the 1 / -1 expected in the loss function because the negative label is represented by 0 in spark.mllib instead of −1, to be consistent with multiclass labeling.

The loss function should be changed to be more like the way Spark does it.

@benmccann
Copy link
Contributor Author

Ahh, looks like it does a transform. But I think this is a very non-standard way of doing things since the goal is to upstream this and have it merged to Spark's mllib. I believe they use the 1 / 0 representation internally and we shouldn't change that.

val data = task match {
  case 0 =>
    input.map(l => (l.label, l.features)).persist()
  case 1 =>
    input.map(l => (if (l.label > 0) 1.0 else -1.0, l.features)).persist()
}

@benmccann benmccann changed the title Incorrect loss function for binary classification Use 1/0 labels for binary classification instead of 1/-1 May 16, 2016
@zdx
Copy link

zdx commented Feb 10, 2017

conclusion?

@willysys
Copy link

willysys commented Oct 11, 2018

In classification problem,why compute gradient use logitloss, but get loss use hingeloss ?
get gradient in code as follows:

val mult = task match {
      case 0 =>
        pred - label
      case 1 =>
        -label * (1.0 - 1.0 / (1.0 + Math.exp(-label * pred)))
    }

get loss in code as follows:

task match {
      case 0 =>
        (pred - label) * (pred - label)
      case 1 =>
        1 - Math.signum(pred * label)            //hinge loss
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants