feat: single row predictor #20

AnsgarSchuffenhauer · 2022-12-13T11:09:03Z

TODO:

merge Predict prepare MELLODDY-TUNER#3 before this one

This adds a predictor that accepts a single descriptor and feeds it to the models

Description

This adds an alternative predictor class that is meant for single row predictions. It makes use of the single row descriptor preparator added in the following pull request for MELLODDY Tuner: melloddy/MELLODDY-TUNER#3 and feeds the sparse torch tensor directly to the neural network

Motivation and Context

This fulfills the need to run feature generation and prediction in a non-batched atomic mode for each structure independently, which makes error handling much leaner.

How Has This Been Tested?

This has been tested in conjunction with the changed MELLODDY Tuner, and returns on the test data provided with MELLODDY-Predictor equal results.

Ping Reviewers

@melloddy/predictor-users

Please check if the PR fulfills these requirements

If necessary, the changelog has been updated
Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)

Fabien-GELUS

Thank you for your contribution!

melloddy_predictor/predictor_single.py

tests/test_single_predictor.py

Fabien-GELUS · 2022-12-29T10:14:42Z

melloddy_predictor/predictor_single.py

+    hybrid = 2
+
+
+class PredictorSingle:


Can't we merge this class with the Model one ?

Yes, I could see that this could work. I wasn't sure to begin with that that would work without major backward incompatibility:

There would simply be different predict functions to choose from.

The init function could be made to work with the signature as in the current model object, but with tasks maps added as additional optional parameters. If task maps are added the model gets automatically loaded (Model.load()) in order to validate the task maps

The introspection functions of single row predictor object to retrieve tasks counts and model types would be added to the model object. Calling them would mean the model needs to be loaded, if not already done (Model.load())

task map handling would be part of the model init function, but would have of course no effect if there were no task maps

Some things are handled however slightly differently in both classes and need to addressed before we can attempt merging the classes

dropout: The single row predictor object applies it directly to the network upon initialization, in the current model object dropout is handled by the sparsechem predict function. So the question is here if we apply dropout directly to the network upon initialization, whether doing it a second time in the sparsechem predict function has some unwanted effect. If not we could go ahead by simply applying it at the initialzation stage

The single row predictor object doesn't support catalogue heads. But then also in your case the catalogue head mapping y_cat_columns needs to be provided externally, so the solution could be to do the same for the single row predictor. One could possibly create a function extracting this from a T8c file.

Both classes memorize the statistics for inverse nromalization in slightly different way, especially with respect to when they are converted to a numpy array (immediately upon model initialization in the single row predictor, class versus upon calling predict in the Model class. In terms of keeping the prediction of the single row predictor as lean as possible, I would prefer here doing everything that can be done upon initialization and model loading at this stage, including computing the standard deviation from the variance.

Thanks for these advices, I won't have time to do this merge but I'll create an issue with your remarks.

melloddy_predictor/predictor_single.py

tests/test_single_predictor.py

…alization flag The flag inverse_normalization does not exist in sparsechem/melloddy config files

in-spite of inverse norm being done directly in code

* reworked t8df_to_task_map * added more tests * switched ove rto key provider as in the latest prepare branch from MELLODDY Tuner

Signed-off-by: Fabien Gelus <[email protected]>

Fabien-GELUS

Hello @AnsgarSchuffenhauer, thanks again for your contribution, I did some work to make lints work in the CI. Could you please merge melloddy/MELLODDY-TUNER#3 before merging this one ? (You can then re-run the failing github actions, which should work after merging the MDY-Tuner PR)

Fabien-GELUS · 2023-04-14T09:29:46Z

tests/test_single_predictor.py

+
+def test_trunk_output(test_preds, srprep, input_smiles_df, ref_output_trunk):
+    for mtype, my_pred in test_preds.items():
+        assert np.allclose(np.concatenate([my_pred.predict_trunk_from_tensor(srprep.process_smiles(smi)) for k,smi in input_smiles_df.set_index("input_compound_id")["smiles"].items()]),ref_output_trunk[mtype])


On my side the test doesn't pass because of a slight delta:

E assert False E + where False = <function allclose at 0x10f937ee0>(array([[ 0.08066268, 0.00406612, 0.0004338 , ..., 0.04019926,\n -0.07572228, 0.01461065],\n [ 0.0980792...3],\n [ 0.12118611, -0.02552076, 0.04818719, ..., 0.09924001,\n 0.09236208, 0.09422375]], dtype=float32), array([[ 0.08066268, 0.00406611, 0.0004338 , ..., 0.04019927,\n -0.07572226, 0.01461066],\n [ 0.0980792...3],\n [ 0.12118611, -0.02552078, 0.04818719, ..., 0.09924001,\n 0.09236208, 0.09422375]], dtype=float32)) E + where <function allclose at 0x10f937ee0> = np.allclose E + and array([[ 0.08066268, 0.00406612, 0.0004338 , ..., 0.04019926,\n -0.07572228, 0.01461065],\n [ 0.0980792...3],\n [ 0.12118611, -0.02552076, 0.04818719, ..., 0.09924001,\n 0.09236208, 0.09422375]], dtype=float32) = <function concatenate at 0x10f725040>([array([[ 8.06626752e-02, 4.06611711e-03, 4.33795154e-04,\n -2.49629468e-03, 6.31332397e-03, -9.51325372e-02,...178, -0.07470306,\n -0.11199417, 0.02171535, 0.02768455, 0.13399841, 0.08680404]],\n dtype=float32), ...]) E + where <function concatenate at 0x10f725040> = np.concatenate tests/test_single_predictor.py:209: AssertionError

If it's not a problem, I added a rtol

Signed-off-by: Fabien Gelus <[email protected]>

AnsgarSchuffenhauer requested review from AlexandrePicosson, mblottiere and Fabien-GELUS as code owners December 13, 2022 11:09

Fabien-GELUS requested changes Dec 29, 2022

View reviewed changes

AnsgarSchuffenhauer and others added 16 commits April 26, 2023 10:45

add single row predictor object

7876e57

Added documentaion to single row predictor

5918b9f

Added a test notebook

7dbbe96

added test data

500b2fb

Cleanup removed an accidentally inluded notebook

0f54a88

added Enum type class for model/task type encoding

ab0cbaa

added test for single predictor

6bcaf95

added example jupypter notebook

b07c579

update tests

e97d067

bug fixes

74cef47

testing now they key provider in the example

4c67c7e

additions to extraction of meta data from T8 files

1b58a78

checks for stats section in model config file instead of inverse_norm…

1bab145

…alization flag The flag inverse_normalization does not exist in sparsechem/melloddy config files

comment out inverse norm form sparsechem

e65ed60

in-spite of inverse norm being done directly in code

reworked t8df_to_task_map,

585a6ad

* reworked t8df_to_task_map * added more tests * switched ove rto key provider as in the latest prepare branch from MELLODDY Tuner

added reference input for tests

75f9aa5

Fabien-GELUS force-pushed the single_row_predictor branch from 00ae640 to 75f9aa5 Compare April 26, 2023 08:45

fix lints and tests

bdb8e3e

Signed-off-by: Fabien Gelus <[email protected]>

Fabien-GELUS changed the title ~~Single row predictor~~ feat: Single row predictor May 12, 2023

make lint in CI

1df2ffa

Signed-off-by: Fabien Gelus <[email protected]>

Fabien-GELUS changed the title ~~feat: Single row predictor~~ feat: single row predictor May 12, 2023

Fabien-GELUS added 3 commits May 12, 2023 15:43

install precommit

27420c4

Signed-off-by: Fabien Gelus <[email protected]>

fix lints

0266e4e

Signed-off-by: Fabien Gelus <[email protected]>

add tests to CI

9ea56c7

Signed-off-by: Fabien Gelus <[email protected]>

Fabien-GELUS mentioned this pull request May 12, 2023

Merge PredictorSingle and Model classes #22

Open

add tests dependencies

18d1c94

Signed-off-by: Fabien Gelus <[email protected]>

Fabien-GELUS approved these changes May 12, 2023

View reviewed changes

changelog

3f7698d

Signed-off-by: Fabien Gelus <[email protected]>

AlexandrePicosson removed their request for review June 2, 2023 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: single row predictor #20

feat: single row predictor #20

AnsgarSchuffenhauer commented Dec 13, 2022 •

edited by Fabien-GELUS

Loading

Fabien-GELUS left a comment

Fabien-GELUS Dec 29, 2022

AnsgarSchuffenhauer Feb 23, 2023 •

edited

Loading

Fabien-GELUS May 12, 2023

Fabien-GELUS left a comment

Fabien-GELUS Apr 14, 2023

Fabien-GELUS May 12, 2023

feat: single row predictor #20

Are you sure you want to change the base?

feat: single row predictor #20

Conversation

AnsgarSchuffenhauer commented Dec 13, 2022 • edited by Fabien-GELUS Loading

Description

Motivation and Context

How Has This Been Tested?

Ping Reviewers

Please check if the PR fulfills these requirements

Fabien-GELUS left a comment

Choose a reason for hiding this comment

Fabien-GELUS Dec 29, 2022

Choose a reason for hiding this comment

AnsgarSchuffenhauer Feb 23, 2023 • edited Loading

Choose a reason for hiding this comment

Fabien-GELUS May 12, 2023

Choose a reason for hiding this comment

Fabien-GELUS left a comment

Choose a reason for hiding this comment

Fabien-GELUS Apr 14, 2023

Choose a reason for hiding this comment

Fabien-GELUS May 12, 2023

Choose a reason for hiding this comment

AnsgarSchuffenhauer commented Dec 13, 2022 •

edited by Fabien-GELUS

Loading

AnsgarSchuffenhauer Feb 23, 2023 •

edited

Loading