Post-SMTB cleanup #31

ilsenatorov · 2024-08-16T15:21:15Z

Most important changes:

Added unit testing with pytest
Added github action to run the tests
Cleaned up redundant code
Rearranged some files here and there
Improved config/training logic for finetuning

for more information, see https://pre-commit.ci

Old-Shatterhand

Applies to all files: Comment the code. It will be a project with >5 people working on and maintaining it, we need to set some documentation standards to not get completely lost before christmas.

smtb/model.py

.github/workflows/run_test.yaml

smtb/tests/test_data.py

smtb/tests/test_finetune.py

ilsenatorov · 2024-08-16T16:39:14Z

Applies to all files: Comment the code. It will be a project with >5 people working on and maintaining it, we need to set some documentation standards to not get completely lost before christmas.

I would suggest comments/docstring on code that is not self-explanatory. For example:

def setup(self, stage: str | None = None):
        """Create train, val, test datasets."""
        self.train = DownstreamDataset(self.data_dir / "train", self.layer_num)
        self.valid = DownstreamDataset(self.data_dir / "valid", self.layer_num)
        self.test = DownstreamDataset(self.data_dir / "test", self.layer_num)

    def _get_dataloader(self, dataset: DownstreamDataset, shuffle: bool = False) -> torch.utils.data.DataLoader:
        """Create a DataLoader for a given dataset."""
        return DataLoader(
            dataset, batch_size=self.batch_size, num_workers=self.num_workers, shuffle=shuffle, collate_fn=collate_fn
        )

    def train_dataloader(self) -> DataLoader:
        return self._get_dataloader(self.train, shuffle=True)

The first 2 functions are not super obvious, so some comments are necessary.

The last one is pretty self-explanatory and requires no documentation (IMO).

But since we are in a phase of active development, I wouldn't enforce huge docstrings on every function. Type hints and good variable names should be enough for 95% of the code.

sdkaraban · 2024-08-22T19:07:58Z

smtb/model.py

are we going to add those classifications as models?

sdkaraban · 2024-08-22T19:13:04Z

smtb/pooling.py

+        queries = self.linear_query(x)
+        attention_scores = torch.matmul(queries, keys.transpose(-2, -1))
+        attention_weights = self.softmax(attention_scores)
+        pooled_output = torch.matmul(attention_weights, x).sum(dim=1)


maybe add normalization here?

danielkorkin · 2024-08-27T13:19:06Z

.github/workflows/run_test.yaml

+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11"]


Are we just testing for 3.11? Maybe add other versions like 3.10 or eeven 3.9.

Related to this comment. I'd test all Python versions that are actively developed, such as 3.12. But if we fix versions in the requirements, I don't see why we should test multiple versions. The installed packages will have the same versions across different tested Python versions.

danielkorkin · 2024-08-27T13:21:43Z

requirements.txt

 torch
-pytorch-lightning
+lightning
 torchmetrics
+rich
 fair-esm
-jsonargparse
 wandb
 tokenizers
 transformers
 beartype
 datasets
 transformers[torch]
+pytest
+pytest-cov


Should we start specifying versions to avoid breaking changes and use dependabot to update andreview version changes?

This should be easily accomplished by installing requirements and then running pip freeze

danielkorkin · 2024-08-27T13:23:16Z

setup.py

+    url="https://github.com/kalininalab/SMTB2024",
+    packages=["smtb"],
+    install_requires=requirements,
+    python_requires=">=3.11",


Again is 3.11 maybe too high, maybe 3.10?

I don't think, 3.11 is too high. I have another project with similar dependencies that is run on 3.11 and it works fine, so, I wouldn't expect any problems. But we can downgrade to 3.10 if you want.

Yes, maybe >=3.10

for more information, see https://pre-commit.ci

ilsenatorov added 13 commits August 16, 2024 13:05

added pytest stuff

a13a591

moved scripts to a separate folder

5d4854d

wrapped into a package

152e443

added pooling tests

f6b5b6a

added data testing

7e15094

model tests

25bc677

cleaned up the training script

d96e808

moved training loop into the module

9b7041f

actions for pytest

697926b

fixed ruff misbehaving

dd44b6f

added workflow env caching

bf48266

fixed devices

6b10de7

Merge remote-tracking branch 'origin/main' into dev

8830bdb

ilsenatorov requested review from Old-Shatterhand, danielkorkin and smiling-dino August 16, 2024 15:21

ilsenatorov self-assigned this Aug 16, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

3a44a68

for more information, see https://pre-commit.ci

Old-Shatterhand requested changes Aug 16, 2024

View reviewed changes

smtb/model.py Outdated Show resolved Hide resolved

.github/workflows/run_test.yaml Outdated Show resolved Hide resolved

smtb/tests/test_data.py Outdated Show resolved Hide resolved

smtb/tests/test_finetune.py Outdated Show resolved Hide resolved

ilsenatorov added 2 commits August 16, 2024 21:56

updated documentation and tests

9167693

Merge branch 'dev' of github.com:kalininalab/SMTB2024 into dev

dcf30f3

ilsenatorov requested a review from Old-Shatterhand August 17, 2024 06:38

sdkaraban reviewed Aug 22, 2024

View reviewed changes

smtb/model.py

Copy link

Collaborator

sdkaraban Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to add those classifications as models?

sdkaraban reviewed Aug 22, 2024

View reviewed changes

sdkaraban self-requested a review August 24, 2024 11:12

sdkaraban approved these changes Aug 24, 2024

View reviewed changes

danielkorkin reviewed Aug 27, 2024

View reviewed changes

Old-Shatterhand and others added 2 commits September 29, 2024 20:41

Refactoring of tests and restructuring of requirements

4e2df12

[pre-commit.ci] auto fixes from pre-commit.com hooks

99ae8e1

for more information, see https://pre-commit.ci

Old-Shatterhand and others added 6 commits September 29, 2024 23:31

Comments for smtb module

0406e4f

Merge branch 'dev' of github.com:kalininalab/SMTB2024 into dev

85ff675

Minor fix

48298ab

Another update for strange reasons

8049deb

Merge branch 'main' into dev

a5d34fa

[pre-commit.ci] auto fixes from pre-commit.com hooks

23dcfae

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-SMTB cleanup #31

Post-SMTB cleanup #31

ilsenatorov commented Aug 16, 2024

Old-Shatterhand left a comment

ilsenatorov commented Aug 16, 2024

sdkaraban Aug 22, 2024

sdkaraban Aug 22, 2024

danielkorkin Aug 27, 2024

Old-Shatterhand Sep 1, 2024

danielkorkin Aug 27, 2024

ilsenatorov Aug 28, 2024

danielkorkin Nov 20, 2024

danielkorkin Aug 27, 2024

ilsenatorov Aug 28, 2024

Old-Shatterhand Sep 1, 2024

danielkorkin Nov 20, 2024

Post-SMTB cleanup #31

Are you sure you want to change the base?

Post-SMTB cleanup #31

Conversation

ilsenatorov commented Aug 16, 2024

Old-Shatterhand left a comment

Choose a reason for hiding this comment

ilsenatorov commented Aug 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment