-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post-SMTB cleanup #31
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applies to all files: Comment the code. It will be a project with >5 people working on and maintaining it, we need to set some documentation standards to not get completely lost before christmas.
I would suggest comments/docstring on code that is not self-explanatory. For example: def setup(self, stage: str | None = None):
"""Create train, val, test datasets."""
self.train = DownstreamDataset(self.data_dir / "train", self.layer_num)
self.valid = DownstreamDataset(self.data_dir / "valid", self.layer_num)
self.test = DownstreamDataset(self.data_dir / "test", self.layer_num)
def _get_dataloader(self, dataset: DownstreamDataset, shuffle: bool = False) -> torch.utils.data.DataLoader:
"""Create a DataLoader for a given dataset."""
return DataLoader(
dataset, batch_size=self.batch_size, num_workers=self.num_workers, shuffle=shuffle, collate_fn=collate_fn
)
def train_dataloader(self) -> DataLoader:
return self._get_dataloader(self.train, shuffle=True) The first 2 functions are not super obvious, so some comments are necessary. The last one is pretty self-explanatory and requires no documentation (IMO). But since we are in a phase of active development, I wouldn't enforce huge docstrings on every function. Type hints and good variable names should be enough for 95% of the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we going to add those classifications as models?
smtb/pooling.py
Outdated
queries = self.linear_query(x) | ||
attention_scores = torch.matmul(queries, keys.transpose(-2, -1)) | ||
attention_weights = self.softmax(attention_scores) | ||
pooled_output = torch.matmul(attention_weights, x).sum(dim=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add normalization here?
.github/workflows/run_test.yaml
Outdated
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: ["3.11"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just testing for 3.11? Maybe add other versions like 3.10 or eeven 3.9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to this comment. I'd test all Python versions that are actively developed, such as 3.12. But if we fix versions in the requirements, I don't see why we should test multiple versions. The installed packages will have the same versions across different tested Python versions.
requirements.txt
Outdated
torch | ||
pytorch-lightning | ||
lightning | ||
torchmetrics | ||
rich | ||
fair-esm | ||
jsonargparse | ||
wandb | ||
tokenizers | ||
transformers | ||
beartype | ||
datasets | ||
transformers[torch] | ||
pytest | ||
pytest-cov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we start specifying versions to avoid breaking changes and use dependabot to update andreview version changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be easily accomplished by installing requirements and then running pip freeze
setup.py
Outdated
url="https://github.com/kalininalab/SMTB2024", | ||
packages=["smtb"], | ||
install_requires=requirements, | ||
python_requires=">=3.11", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again is 3.11 maybe too high, maybe 3.10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think, 3.11 is too high. I have another project with similar dependencies that is run on 3.11 and it works fine, so, I wouldn't expect any problems. But we can downgrade to 3.10 if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe >=3.10
for more information, see https://pre-commit.ci
Most important changes: