Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying the usage to use weighted train data , instead of individual train data points #109

Open
ekta1007 opened this issue Sep 13, 2017 · 11 comments

Comments

@ekta1007
Copy link

Say, you have data point in test as

X1, X2, X2 -> Y1
X1, X2, X2 -> Y1
X1, X2, X2 -> Y1
X1, X2, X3 -> Y1'
X1, X2, X3 -> Y1' - this gives fm.w0_, fm.w_ and fm.V_ as learnt model params

instead of treating them as 4 points (which increases the size of the train data set), is it possible to use weights,such that we train using the full sample, but now on aggregate data points, with the number of times occurred as weights, instead of 5 data points, as in example above ?

X1, X2, X2 -> Y1 - weight 3
X1, X2, X3 -> Y1' -> weight 2 so that the training still gives us the same fm.w0_, fm.w_ and fm.V_ as it were trained with 5 samples above.

@ibayer
Copy link
Owner

ibayer commented Sep 13, 2017

Sample weights are not yet supported but I plan to add this feature with the next major release.

@zeeraktalat
Copy link

zeeraktalat commented May 8, 2019

Is there an update on this - I imagine it could be handled with conforming to the sklearn sample_weights parameter which can be provided when fitting a model (see https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit)

@ibayer
Copy link
Owner

ibayer commented Sep 10, 2019

@ZeerakW
The problem is that the currently solver doesn't support sample weight's yet. A complete rewrite which does support sample weights is close to completion but I can't give you an release date yet (think months not days).

A sample_weight parameter will be made available with the new release.

@zeeraktalat
Copy link

@ibayer Oh that sounds amazing! Thanks for your efforts!

@jwasserman2
Copy link

Hi @ibayer, thank you for your work on this. Do you have an updated estimate of when it will be available?

@ibayer
Copy link
Owner

ibayer commented May 7, 2021

@jwasserman2

I can give an update but not an (estimated) release date.

For regression we already have released c++ code supporting sample weights but the python interface doesn't support it
yet. The current plan is to add more solvers (classification is probably next) before making sample weights available in python.

However, feel free to open a feature request on https://github.com/palaimon/fastfm2 to help us with prioritization.

@jwasserman2
Copy link

@ibayer Congrats on the new package and updating the c++ code! I was specifically thinking about using weights for classification using the SGD solver. Are you still planning on having the 3 solvers ALS, SGD, and MCMC?

@ibayer
Copy link
Owner

ibayer commented May 7, 2021

@jwasserman2

Are you still planning on having the 3 solvers ALS, SGD, and MCMC?

Yes, BUT imo sgd is the least interesting solver and implemented more for completeness. ALS / coordinate descent is in general both faster and easier to use for FMs.

What your motivation to prefer sgd?

@jwasserman2
Copy link

While testing the different solvers, I was running into a data input error (if I remember correctly) using als that I was not seeing when using sgd. I mainly just knew that I didn't want to use MCMC for speed purposes, but between the two I did not have priors on which would be better for my use case.

If it would be helpful for me to recreate the als error for v2 of your package let me know!

@ibayer
Copy link
Owner

ibayer commented May 7, 2021

@jwasserman2

This makes sense. fastfm uses probit regression (same as libfm) for als classification which is less stable the the sigmoid transform used for the sgd classification.

If it would be helpful for me to recreate the als error for v2 of your package let me know!

Thanks for the offer. I hope it's not needed since v2 is a complete rewrite and uses iteratively reweighted least squares for als classification.
The new approach is expected to be more stable and shouldn't have the issue you observed.

edit: I recommend to star https://github.com/palaimon/fastfm2 and open issue with request for sample weight support.
This way you get an notification as soon as we add the feature.

@jwasserman2
Copy link

Awesome, will do. Thank you again for taking the time to add this functionality, looking forward to its release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants