-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: single row predictor #20
base: main
Are you sure you want to change the base?
feat: single row predictor #20
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
hybrid = 2 | ||
|
||
|
||
class PredictorSingle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we merge this class with the Model one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I could see that this could work. I wasn't sure to begin with that that would work without major backward incompatibility:
- There would simply be different predict functions to choose from.
- The init function could be made to work with the signature as in the current model object, but with tasks maps added as additional optional parameters. If task maps are added the model gets automatically loaded (Model.load()) in order to validate the task maps
- The introspection functions of single row predictor object to retrieve tasks counts and model types would be added to the model object. Calling them would mean the model needs to be loaded, if not already done (Model.load())
- task map handling would be part of the model init function, but would have of course no effect if there were no task maps
Some things are handled however slightly differently in both classes and need to addressed before we can attempt merging the classes
- dropout: The single row predictor object applies it directly to the network upon initialization, in the current model object dropout is handled by the sparsechem predict function. So the question is here if we apply dropout directly to the network upon initialization, whether doing it a second time in the sparsechem predict function has some unwanted effect. If not we could go ahead by simply applying it at the initialzation stage
- The single row predictor object doesn't support catalogue heads. But then also in your case the catalogue head mapping y_cat_columns needs to be provided externally, so the solution could be to do the same for the single row predictor. One could possibly create a function extracting this from a T8c file.
- Both classes memorize the statistics for inverse nromalization in slightly different way, especially with respect to when they are converted to a numpy array (immediately upon model initialization in the single row predictor, class versus upon calling predict in the Model class. In terms of keeping the prediction of the single row predictor as lean as possible, I would prefer here doing everything that can be done upon initialization and model loading at this stage, including computing the standard deviation from the variance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these advices, I won't have time to do this merge but I'll create an issue with your remarks.
…alization flag The flag inverse_normalization does not exist in sparsechem/melloddy config files
in-spite of inverse norm being done directly in code
* reworked t8df_to_task_map * added more tests * switched ove rto key provider as in the latest prepare branch from MELLODDY Tuner
00ae640
to
75f9aa5
Compare
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @AnsgarSchuffenhauer, thanks again for your contribution, I did some work to make lints work in the CI. Could you please merge melloddy/MELLODDY-TUNER#3 before merging this one ? (You can then re-run the failing github actions, which should work after merging the MDY-Tuner PR)
tests/test_single_predictor.py
Outdated
|
||
def test_trunk_output(test_preds, srprep, input_smiles_df, ref_output_trunk): | ||
for mtype, my_pred in test_preds.items(): | ||
assert np.allclose(np.concatenate([my_pred.predict_trunk_from_tensor(srprep.process_smiles(smi)) for k,smi in input_smiles_df.set_index("input_compound_id")["smiles"].items()]),ref_output_trunk[mtype]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my side the test doesn't pass because of a slight delta:
E assert False
E + where False = <function allclose at 0x10f937ee0>(array([[ 0.08066268, 0.00406612, 0.0004338 , ..., 0.04019926,\n -0.07572228, 0.01461065],\n [ 0.0980792...3],\n [ 0.12118611, -0.02552076, 0.04818719, ..., 0.09924001,\n 0.09236208, 0.09422375]], dtype=float32), array([[ 0.08066268, 0.00406611, 0.0004338 , ..., 0.04019927,\n -0.07572226, 0.01461066],\n [ 0.0980792...3],\n [ 0.12118611, -0.02552078, 0.04818719, ..., 0.09924001,\n 0.09236208, 0.09422375]], dtype=float32))
E + where <function allclose at 0x10f937ee0> = np.allclose
E + and array([[ 0.08066268, 0.00406612, 0.0004338 , ..., 0.04019926,\n -0.07572228, 0.01461065],\n [ 0.0980792...3],\n [ 0.12118611, -0.02552076, 0.04818719, ..., 0.09924001,\n 0.09236208, 0.09422375]], dtype=float32) = <function concatenate at 0x10f725040>([array([[ 8.06626752e-02, 4.06611711e-03, 4.33795154e-04,\n -2.49629468e-03, 6.31332397e-03, -9.51325372e-02,...178, -0.07470306,\n -0.11199417, 0.02171535, 0.02768455, 0.13399841, 0.08680404]],\n dtype=float32), ...])
E + where <function concatenate at 0x10f725040> = np.concatenate
tests/test_single_predictor.py:209: AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not a problem, I added a rtol
Signed-off-by: Fabien Gelus <[email protected]>
TODO:
This adds a predictor that accepts a single descriptor and feeds it to the models
Description
This adds an alternative predictor class that is meant for single row predictions. It makes use of the single row descriptor preparator added in the following pull request for MELLODDY Tuner: melloddy/MELLODDY-TUNER#3 and feeds the sparse torch tensor directly to the neural network
Motivation and Context
This fulfills the need to run feature generation and prediction in a non-batched atomic mode for each structure independently, which makes error handling much leaner.
How Has This Been Tested?
This has been tested in conjunction with the changed MELLODDY Tuner, and returns on the test data provided with MELLODDY-Predictor equal results.
Ping Reviewers
@melloddy/predictor-users
Please check if the PR fulfills these requirements