Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fit callbacks #145

Open
wants to merge 3 commits into
base: feature_jsonSerialization
Choose a base branch
from
Open

Fit callbacks #145

wants to merge 3 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Oct 16, 2018

Pull request for Python and Cython changes for fit_callbacks.

fastFM/ffm2.pyx Outdated
# In essence it wraps the python function so it can be used in C++ space.
# Convert the python function from a pointer back into a python object and invoke
# with other parameters. (For now just one, `current_iter`)
cdef void fit_callback_wrapper(int current_iter, void* python_function):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts on this:

  • callback is not None. If callback is None inform C++ side somehow so no cycles are wasted invoking the callback. This might be harder than it sounds.
  • callback corresponds to fit_callback_t. One way to check is to catch TypeError which is raised if a function is invoked with less or more than the required number of arguments.
  • passing all parameters as dict and invoking the python callback via **kwargs for greater flexibility

@ibayer
Copy link
Owner

ibayer commented Oct 16, 2018

None inform C++ side somehow so no cycles are wasted invoking the callback.

I don't think invoking the callback will have any noticeable overhead if we do this ones per iterations (at least for any non super small toy examples) if I'm wrong we can fix it later. Let's not risk premature optimization here. If you are sure it's a problem then let's do an experiment first to proof it.

One way to check is to catch TypeError which is raised if a function is invoked with less or more than the required number of arguments.

I don't yet see when it's useful to pass data from cpp to python via call back function. However it could be useful if the return of the python function could be used to stop the solver (return continue true/false).

passing all parameters as dict and invoking the python callback via **kwargs for greater flexibility

I would just keep the function as simple as possible for now.

@ibayer
Copy link
Owner

ibayer commented Oct 16, 2018

I changed my mind a bit about passing parameter from cpp to python.
Passing a string would be very flexible as this would allow us to serialize json. We already do this to parse parameter from python to cpp. It's a bit dirty, not type checking etc. but very flexible.

@ghost
Copy link
Author

ghost commented Oct 17, 2018

It's a bit dirty, not type checking etc. but very flexible.

The broad idea I had with such a cpp sends json string -> python parses json into dict is to have support for all the ideas you mentioned like MCMC traces, RMSE plots, stopping early, etc with a single fit callback instead of having, for example, two fit callbacks, one for ALS and for MCMC.

We can't type check at compile-time level anyways because everything is void* and object and it has to be deferred to runtime. During runtime, if types do not match an exception will be raised which brings me to my second thought.

callback corresponds to fit_callback_t. One way to check is to catch TypeError which is raised if a function is invoked with less or more than the required number of arguments.

In any possible setup the function can raise exceptions:

  • For type errors or not having enough or too many arguments.
  • For whatever reasons we can't control due to 3rd party users doing weird things in their callbacks.

Once these exceptions happen, unless caught, cause the program to end abruptly. The possible issue here is memory management:

    # Callback raises exception and python crashes probably.
    # Exception raised when fit_callback_wrapper executes in C++ space.
    cpp_ffm.fit_with_callback(s, m, d, fit_callback_wrapper, (<void*> callback))
   
    # Never gets called => Memory leak.
    del d
    del d
    del m

Anyways, in essence, I'm saying we must sanitize input and deal with 3rd party errors on one hand and on another hand, we must be flexible so we don't duplicate code and effort in C++ side.

@ibayer
Copy link
Owner

ibayer commented Oct 17, 2018

The broad idea I had with such a cpp sends json string -> python parses json into dict is to have support ...

I agree that's pretty cool.

In any possible setup the function can raise exceptions:

Exceptions can cause all kind of trouble as you have pointed out. Do we really need them? We could simply return an error string if c++ can't deal with the python input.

For type errors or not having enough or too many arguments.

We only take a single serialized json string as input and send error sting back if json isn't valid or values can't be used.

For whatever reasons we can't control due to 3rd party users doing weird things in their callbacks.

I don't think there is much we can do if the user changes the data dimensions or similar things.

# Callback raises exception and python crashes probably.

If python crashes then the memory allocated by the extension should be realized from the OS as well, or?

kwargs = json.loads(json_str.c_str())

print("Arguments from C++: " + str(kwargs))
except json.JSONDecodeError as e: # should never happen but just in case
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now I understand better what you mean. I'm still not to happy with the use of exceptions. We could raise a warning instead and not sanitize the input. We can deal with it on the cpp side.

@ibayer
Copy link
Owner

ibayer commented Oct 17, 2018

Okay, I think I got a bit confused let's summarized it again.

python: call_back(string message) -> bool

message can also be used by cpp to send error message, bool decides if iterations should be continued

cpp: call_back(string message) -> void

message can be used to request solver specific information

@ghost
Copy link
Author

ghost commented Oct 17, 2018

Exceptions can cause all kind of trouble as you have pointed out. Do we really need them? We could simply return an error string if c++ can't deal with the python input.

C++ is OK anyways the problem is the callback function implementation.

Let's say there are two possible callback types, one for MCMC which takes traces and one for ALS which takes the current iteration index and RMSE/accuracy.

Problem 1:

def my_callback(iteration, rmse):
     # do something which throws an exception, here indexing issues
     arr = []
     arr[5] = 3

We must ensure that the solver remains in a stable state and that the user is warned of the issues their callback has. The definition of a stable state is what I need. Should the solver stop because an error was found or continue pretending there is no issue in the callback while the error message is printed.

Problem 2:

def my_callback(iteration, rmse):
     # everything is ok here
     plot_data.append((iteration, rmse))
mcmc_model = mcmc.FMRegressor()
mcmc_model.fit(X, y, my_callback)

Problem is that for MCMC my_callback should have signature traces_last : Dict[str, float], iteration : int, rmse: float and when calling the callback in the wrapper a TypeError is raised.

If python crashes then the memory allocated by the extension should be realized from the OS as well, or?

I don't know, probably but IIRC the C standard (and we are dealing with Cython which is C) does not specify. I think Linux does release the memory, don't know for other OSs.

@ibayer
Copy link
Owner

ibayer commented Oct 17, 2018

Thanks for spelling out the details!

Problem 1

Should the solver stop because an error was found or continue pretending there is no issue in the callback while the error message is printed.

Keep solver going whenever possible (or return false and finish the cpp call properly). We catch all exceptions and turn them into warnings.

Problem 2

def my_callback(iteration, rmse):

def my_callback(message):
     # everything is ok here
     d = parse(message)
     plot_data.append((d['iteration'], d['rmse']))

Parsing a message string would avoid the need to specify the signature explicitly. I know the flexibility comes at a cost but it gives us the full flexibility to explore what can be done using callbacks. We can still refactor later and use strict typing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants