Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: single row predictor #20

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

AnsgarSchuffenhauer
Copy link

@AnsgarSchuffenhauer AnsgarSchuffenhauer commented Dec 13, 2022

TODO:

This adds a predictor that accepts a single descriptor and feeds it to the models

Description

This adds an alternative predictor class that is meant for single row predictions. It makes use of the single row descriptor preparator added in the following pull request for MELLODDY Tuner: melloddy/MELLODDY-TUNER#3 and feeds the sparse torch tensor directly to the neural network

Motivation and Context

This fulfills the need to run feature generation and prediction in a non-batched atomic mode for each structure independently, which makes error handling much leaner.

How Has This Been Tested?

This has been tested in conjunction with the changed MELLODDY Tuner, and returns on the test data provided with MELLODDY-Predictor equal results.

Ping Reviewers

@melloddy/predictor-users

Please check if the PR fulfills these requirements

  • If necessary, the changelog has been updated
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

Copy link
Collaborator

@Fabien-GELUS Fabien-GELUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

melloddy_predictor/predictor_single.py Outdated Show resolved Hide resolved
tests/test_single_predictor.py Show resolved Hide resolved
hybrid = 2


class PredictorSingle:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we merge this class with the Model one ?

Copy link
Author

@AnsgarSchuffenhauer AnsgarSchuffenhauer Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I could see that this could work. I wasn't sure to begin with that that would work without major backward incompatibility:

  • There would simply be different predict functions to choose from.
  • The init function could be made to work with the signature as in the current model object, but with tasks maps added as additional optional parameters. If task maps are added the model gets automatically loaded (Model.load()) in order to validate the task maps
  • The introspection functions of single row predictor object to retrieve tasks counts and model types would be added to the model object. Calling them would mean the model needs to be loaded, if not already done (Model.load())
  • task map handling would be part of the model init function, but would have of course no effect if there were no task maps

Some things are handled however slightly differently in both classes and need to addressed before we can attempt merging the classes

  • dropout: The single row predictor object applies it directly to the network upon initialization, in the current model object dropout is handled by the sparsechem predict function. So the question is here if we apply dropout directly to the network upon initialization, whether doing it a second time in the sparsechem predict function has some unwanted effect. If not we could go ahead by simply applying it at the initialzation stage
  • The single row predictor object doesn't support catalogue heads. But then also in your case the catalogue head mapping y_cat_columns needs to be provided externally, so the solution could be to do the same for the single row predictor. One could possibly create a function extracting this from a T8c file.
  • Both classes memorize the statistics for inverse nromalization in slightly different way, especially with respect to when they are converted to a numpy array (immediately upon model initialization in the single row predictor, class versus upon calling predict in the Model class. In terms of keeping the prediction of the single row predictor as lean as possible, I would prefer here doing everything that can be done upon initialization and model loading at this stage, including computing the standard deviation from the variance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these advices, I won't have time to do this merge but I'll create an issue with your remarks.

melloddy_predictor/predictor_single.py Show resolved Hide resolved
tests/test_single_predictor.py Show resolved Hide resolved
@Fabien-GELUS Fabien-GELUS force-pushed the single_row_predictor branch from 00ae640 to 75f9aa5 Compare April 26, 2023 08:45
Signed-off-by: Fabien Gelus <[email protected]>
@Fabien-GELUS Fabien-GELUS changed the title Single row predictor feat: Single row predictor May 12, 2023
Signed-off-by: Fabien Gelus <[email protected]>
@Fabien-GELUS Fabien-GELUS changed the title feat: Single row predictor feat: single row predictor May 12, 2023
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
Signed-off-by: Fabien Gelus <[email protected]>
Copy link
Collaborator

@Fabien-GELUS Fabien-GELUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @AnsgarSchuffenhauer, thanks again for your contribution, I did some work to make lints work in the CI. Could you please merge melloddy/MELLODDY-TUNER#3 before merging this one ? (You can then re-run the failing github actions, which should work after merging the MDY-Tuner PR)


def test_trunk_output(test_preds, srprep, input_smiles_df, ref_output_trunk):
for mtype, my_pred in test_preds.items():
assert np.allclose(np.concatenate([my_pred.predict_trunk_from_tensor(srprep.process_smiles(smi)) for k,smi in input_smiles_df.set_index("input_compound_id")["smiles"].items()]),ref_output_trunk[mtype])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my side the test doesn't pass because of a slight delta:

E           assert False
E            +  where False = <function allclose at 0x10f937ee0>(array([[ 0.08066268,  0.00406612,  0.0004338 , ...,  0.04019926,\n        -0.07572228,  0.01461065],\n       [ 0.0980792...3],\n       [ 0.12118611, -0.02552076,  0.04818719, ...,  0.09924001,\n         0.09236208,  0.09422375]], dtype=float32), array([[ 0.08066268,  0.00406611,  0.0004338 , ...,  0.04019927,\n        -0.07572226,  0.01461066],\n       [ 0.0980792...3],\n       [ 0.12118611, -0.02552078,  0.04818719, ...,  0.09924001,\n         0.09236208,  0.09422375]], dtype=float32))
E            +    where <function allclose at 0x10f937ee0> = np.allclose
E            +    and   array([[ 0.08066268,  0.00406612,  0.0004338 , ...,  0.04019926,\n        -0.07572228,  0.01461065],\n       [ 0.0980792...3],\n       [ 0.12118611, -0.02552076,  0.04818719, ...,  0.09924001,\n         0.09236208,  0.09422375]], dtype=float32) = <function concatenate at 0x10f725040>([array([[ 8.06626752e-02,  4.06611711e-03,  4.33795154e-04,\n        -2.49629468e-03,  6.31332397e-03, -9.51325372e-02,...178, -0.07470306,\n        -0.11199417,  0.02171535,  0.02768455,  0.13399841,  0.08680404]],\n      dtype=float32), ...])
E            +      where <function concatenate at 0x10f725040> = np.concatenate

tests/test_single_predictor.py:209: AssertionError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not a problem, I added a rtol

Signed-off-by: Fabien Gelus <[email protected]>
@AlexandrePicosson AlexandrePicosson removed their request for review June 2, 2023 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants