-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fit callbacks #145
base: feature_jsonSerialization
Are you sure you want to change the base?
Fit callbacks #145
Conversation
fastFM/ffm2.pyx
Outdated
# In essence it wraps the python function so it can be used in C++ space. | ||
# Convert the python function from a pointer back into a python object and invoke | ||
# with other parameters. (For now just one, `current_iter`) | ||
cdef void fit_callback_wrapper(int current_iter, void* python_function): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts on this:
- callback is not None. If callback is None inform C++ side somehow so no cycles are wasted invoking the callback. This might be harder than it sounds.
- callback corresponds to
fit_callback_t
. One way to check is to catchTypeError
which is raised if a function is invoked with less or more than the required number of arguments. - passing all parameters as
dict
and invoking the python callback via **kwargs for greater flexibility
I don't think invoking the callback will have any noticeable overhead if we do this ones per iterations (at least for any non super small toy examples) if I'm wrong we can fix it later. Let's not risk premature optimization here. If you are sure it's a problem then let's do an experiment first to proof it.
I don't yet see when it's useful to pass data from cpp to python via call back function. However it could be useful if the return of the python function could be used to stop the solver (return continue true/false).
I would just keep the function as simple as possible for now. |
I changed my mind a bit about passing parameter from cpp to python. |
The broad idea I had with such a cpp sends json string -> python parses json into dict is to have support for all the ideas you mentioned like MCMC traces, RMSE plots, stopping early, etc with a single fit callback instead of having, for example, two fit callbacks, one for ALS and for MCMC. We can't type check at compile-time level anyways because everything is
In any possible setup the function can raise exceptions:
Once these exceptions happen, unless caught, cause the program to end abruptly. The possible issue here is memory management:
Anyways, in essence, I'm saying we must sanitize input and deal with 3rd party errors on one hand and on another hand, we must be flexible so we don't duplicate code and effort in C++ side. |
I agree that's pretty cool.
Exceptions can cause all kind of trouble as you have pointed out. Do we really need them? We could simply return an error string if c++ can't deal with the python input.
We only take a single serialized json string as input and send error sting back if json isn't valid or values can't be used.
I don't think there is much we can do if the user changes the data dimensions or similar things.
If python crashes then the memory allocated by the extension should be realized from the OS as well, or? |
kwargs = json.loads(json_str.c_str()) | ||
|
||
print("Arguments from C++: " + str(kwargs)) | ||
except json.JSONDecodeError as e: # should never happen but just in case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, now I understand better what you mean. I'm still not to happy with the use of exceptions. We could raise a warning instead and not sanitize the input. We can deal with it on the cpp side.
Okay, I think I got a bit confused let's summarized it again.
message can also be used by cpp to send error message, bool decides if iterations should be continued
message can be used to request solver specific information |
C++ is OK anyways the problem is the callback function implementation. Let's say there are two possible callback types, one for MCMC which takes traces and one for ALS which takes the current iteration index and RMSE/accuracy. Problem 1:
We must ensure that the solver remains in a stable state and that the user is warned of the issues their callback has. The definition of a stable state is what I need. Should the solver stop because an error was found or continue pretending there is no issue in the callback while the error message is printed. Problem 2:
Problem is that for MCMC my_callback should have signature
I don't know, probably but IIRC the C standard (and we are dealing with Cython which is C) does not specify. I think Linux does release the memory, don't know for other OSs. |
Thanks for spelling out the details! Problem 1
Keep solver going whenever possible (or return false and finish the cpp call properly). We catch all exceptions and turn them into warnings. Problem 2
Parsing a message string would avoid the need to specify the signature explicitly. I know the flexibility comes at a cost but it gives us the full flexibility to explore what can be done using callbacks. We can still refactor later and use strict typing. |
Pull request for Python and Cython changes for fit_callbacks.