Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

aai-institute / continuiti Public

Notifications You must be signed in to change notification settings
Fork 3
Star 25

Code
Issues 8
Pull requests 5
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Feature: Masked Dataset #151

Open

JakobEliasWagner wants to merge 2 commits into main

base: main

Choose a base branch

Loading

Loading

from feature/masked-dataset

Open

Feature: Masked Dataset #151

JakobEliasWagner wants to merge 2 commits into main from feature/masked-dataset

Conversation 28 Commits 2 Checks 14 Files changed

Conversation

Copy link

Collaborator

JakobEliasWagner commented Jul 24, 2024 •

edited

Loading

Feature: Masked Dataset

Description

Not all datasets are consisent in the number of sensors and evaluations. Simulations or measurements are not only performed on a multitude of different grids, but may also contain different numbers of samples in both function spaces/sets. To reflect this the MaskedOperatorDataset class is introduced. It is able to handle datasets with this property.

This PR introduces two new operator classes MaskedOperatorDataset. The MaskedOperatorDataset is able to process datasets with varying number of sensors or evaluations.

Which issue does this PR tackle?

The OperatorDataset class is able to only handle uniform evaluation- and sensor-numbers.

How does it solve the problem?

Implements MaskedOperatorDataset class to allow for masked sensors and evaluations.
Moves the transformation method to the dataset base class.

How are the changes tested?

Introduced 10 new unit tests.

Notes

The current masking strategy involves to pad all samples to the size of the biggest sample.
- Other strategies are also possible: cut to the smallest sample by random choice, find a middle-ground between both techniques.
- In datasets where the numbers vary strongly this can be inefficient.
Currently only works for one-dimensional size tensors.
_get_item_ method is a method specific to the OperatorDataset and MaskedOperatorDataset classes for separation.

Checklist for Contributors

Scope: This PR tackles exactly one problem.
Conventions: The branch follows the feature/title-slug convention.
Conventions: The PR title follows the Bugfix: Title convention.
Coding style: The code passes all pre-commit hooks.
Documentation: All changes are well-documented.
Tests: New features are tested and all tests pass successfully.
Changelog: Updated CHANGELOG.md for new features or breaking changes.
Review: A suitable reviewer has been assigned.

Checklist for Reviewers:

The PR solves the issue it claims to solve and only this one.
Changes are tested sufficiently and all tests pass.
Documentation is complete and well-written.
Changelog has been updated, if necessary.

Sorry, something went wrong.

All reactions

JakobEliasWagner added 2 commits

July 2, 2024 17:56


          add masked operator dataset

3209ab5


          add test masked operator dataset

933e100

JakobEliasWagner requested a review from samuelburbulla

July 24, 2024 09:05

JakobEliasWagner self-assigned this

JakobEliasWagner added the enhancement New feature or request label

samuelburbulla requested changes

View reviewed changes

src/continuiti/data/dataset.py

               from continuiti.transforms import Transform
               from continuiti.operators.shape import OperatorShapes, TensorShape
               class OperatorDatasetBase(td.Dataset, ABC):
                   """Abstract base class of a dataset for operator training."""
-                  shapes: OperatorShapes
+                  def __init__(self, shapes: OperatorShapes, n_observations: int) -> None:

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
                def __init__(self, shapes: OperatorShapes, n_observations: int) -> None:
          
                def __init__(self, shapes: OperatorShapes, n_observations: int):

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

+                      """Applies class transformations to four tensors.
+                      Args:
+                          src:

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
                        src:
          
                        src: List of tuples containing a tensor and a transformation to apply to it.

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

Comment on lines +40 to +41

		continue
		out.append(transformation(src_tensor))

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
                            continue
          
                        out.append(transformation(src_tensor))
          
                        else:
          
                            out.append(transformation(src_tensor))

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

-                      self, x: torch.Tensor, u: torch.Tensor, y: torch.Tensor, v: torch.Tensor
-                  ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
-                      """Applies class transformations to four tensors.
+                      return tensors[0], tensors[1], tensors[2], tensors[3]

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
                    return tensors[0], tensors[1], tensors[2], tensors[3]
          
                    return tuple(tensors)

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

+                  """A dataset for operator training containing masks in addition to tensors describing the mapping.
+                  Data, especially described on unstructured grids, can vary in the number of evaluations or sensors. Even
+                  measurements of phenomena do not always contain the same number of sensors and or evaluations. This dataset is able

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
                measurements of phenomena do not always contain the same number of sensors and or evaluations. This dataset is able
          
                measurements of phenomena do not always contain the same number of sensors and/or evaluations. This dataset is able

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

Comment on lines +280 to +282

+                      assert not any(
+                          [torch.any(torch.isinf(mi)) for mi in member]
+                      ), "Expects domain to be truncated in finite space."

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this assertion necessary? Someone might come up with a good reason for using infs in the data, do we have to prevent that?

Sorry, something went wrong.

All reactions

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see we're using inf for padding. However, does it hurt to have more infs (non-masked) in the dataset?

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

+                          padding_value=torch.inf,
+                      ).transpose(1, 2)
+                      values_padded = pad_sequence(
+                          [vi.transpose(0, 1) for vi in values], batch_first=True, padding_value=0

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
                        [vi.transpose(0, 1) for vi in values], batch_first=True, padding_value=0
          
                        [vi.transpose(0, 1) for vi in values],
          
                        batch_first=True,
          
                        padding_value=0,

Sorry, something went wrong.

All reactions

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we pad once with inf and once with 0? Seems arbitrary

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

Comment on lines +293 to +296

+                      mask = member_padded != torch.inf
+                      member_padded[
+                          ~mask
+                      ] = 0  # mask often applied by adding a tensor with -inf values in masked locations (e.g. in scaled dot product).

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is different here if why just used 0 for padding in l. 287?

Sorry, something went wrong.

All reactions

src/continuiti/data/dataset.py

    
                      return sample["x"], sample["u"], sample["y"], sample["v"]

                      return tensors[0], tensors[1], tensors[2], tensors[3], ipt_mask, opt_mask

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
                    return tensors[0], tensors[1], tensors[2], tensors[3], ipt_mask, opt_mask
          
                    return *tuple(tensors), ipt_mask, opt_mask

Sorry, something went wrong.

All reactions

tests/data/test_dataset.py

+                      dataloader = DataLoader(dataset, batch_size=self.batch_size)
+                      for x, u, y, v, ipt_mask, opt_mask in dataloader:
+                          assert True

Copy link

Collaborator

samuelburbulla Jul 31, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do more here?

Sorry, something went wrong.

All reactions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

samuelburbulla samuelburbulla requested changes

Requested changes must be addressed to merge this pull request.

Assignees

JakobEliasWagner

Labels

New feature or request

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.