Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the StreamingBinWriter class that allows bin data to be streamed… #29

Merged
merged 2 commits into from
Apr 14, 2020

Conversation

mscf
Copy link
Collaborator

@mscf mscf commented Apr 7, 2020

… to disk instead of stored in memory.

To facilitate this, the Writer class was updated as follows:

  1. The output filename and extension are now member variables initialized with the an object of the class.
  2. Creation of the directory was moved to an externally visible method to allow a user of objects of the Writer class to create the directory explicitly (StreamingBinWriter needs a valid path to write to on initialization).

… to disk instead of stored in memory.

To facilitate this, the Writer class was updated as follows:
  1. The output filename and extension are now member variables initialized with the an object of the class.
  2. Creation of the directory was moved to an externally visible method to allow a user of objects of the Writer class to create the directory explicitly (StreamingBinWriter needs a valid path to write to on initialization).
@mscf mscf requested review from ViridianForge and ephathaway April 7, 2020 15:49
@ephathaway
Copy link
Member

Hey @mscf, the code looks solid to me, however when I run the tests on the stream_recording branch on my machine, I get this

(mff-pub) Evans-MacBook-Pro:mffpy-public admin$ make test
mypy --ignore-missing-imports mffpy
mffpy/bin_writer.py:150: error: Incompatible types in assignment (expression has type "FileIO", variable has type "BytesIO")
make: *** [test] Error 1

I checked out the line that caused the error

self.stream = FileIO(join(mffdir, self.default_filename), mode='w')

Could it be that self.stream cannot take on a new type in the subclass after it is assigned a BytesIO type in the superclass?

@ViridianForge
Copy link
Contributor

I think you're right @ephathaway - @mscf this could be because Justus developed this project using mypy and enforcing stricter typing.

Copy link
Contributor

@ViridianForge ViridianForge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting some comments on your review to start with - thanks for busting this out so quick @mscf !

@@ -110,3 +111,58 @@ def test_writer_exports_JSON():
remove(filename)
except BaseException:
raise AssertionError(f"""Clean-up failed of '{filename}'.""")

##START##
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start streaming bin writer testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that to denote where I started adding tests and shortly thereafter stopped seeing it at all. I will delete it.

assert layout.name == device
# cleanup
try:
remove(join(dirname, 'info.xml'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit picking - but would rmtree on the dirname accomplish the same thing in a single line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I had just copied the test case for the non-streaming and only changed those things I needed to. I can make this change to.

Subclass of BinWriter to support streaming bin file to disk.
"""

def __init__(self, sampling_rate: int, mffdir: str, data_type: str = 'EEG'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are then any limitations / expectations for sampling_rate, mffdir, or data_type that are non-obvious from the casual read?

(i.e. We don't support smb file paths or some such)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments.

Specifically, as regards the mffdir value, the directory must exist before the object is created, and this is noted. As far as limitations to where the folder can exist: this utilizes the same underlying functionality as the builtin open() and so has the same characteristics. These characteristics are shared by the BinWriter class as it uses open() directly to write the file.

num_channels = 256
sampling_rate = 128
# create an mffpy.Writer and add a file info, and the binary file
W = Writer(dirname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning is obvious within this context, but let's avoid single letter variable names as a matter of practice (as annoying and self-referential that kind of is for such aptly named classes). Maybe
test_writer for W, str_bin_writer for b, test_reader for R.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, another case of just copying the testcase and only changing what needed to be changed.

@@ -34,25 +34,31 @@ def __init__(self, filename: str):
self.filename = filename
self.files: Dict[str, Any] = {}
self._bin_file_added = False
self.mffdir, self.ext = splitext(self.filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in guessing that the intent here is to populate / prepare the filename variable at the time the Writer is instantiated rather than when a call is made to actually write the file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because in order to stream a file to a directory that directory needs to be created first, so I needed some means of accessing the name of the directory to pass to the bin writer.

mffdir += '.mff'
makedirs(mffdir, exist_ok=False)

self.create_directory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not disagreeing, but why break out this functionality into its own function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because currently it only creates the folder when write is called. Write needs to be called only after all of the data is collected, but in order to stream the EEG data to a file, the directory needs to already exist.

Updates code/comments to address issues raised by code review.
Copy link
Member

@ephathaway ephathaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the tests on the updated branch and they are all passing! Besides the one comment I left, I would not make any changes.

@@ -39,6 +39,7 @@ def __init__(self, filename: str):
self.file_created = False

def create_directory(self):
"""Creates the directory for the recording."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the .mff directory contains a lot of extra info files besides just the signals, we might want to edit the description to say we are creating the directory for all parts of the .mff file.

@ephathaway
Copy link
Member

@mscf do you think that the StreamingBinWriter would provide a work-around for this issue #30 ?

@mscf mscf merged commit cb8c03b into master Apr 14, 2020
@jusjusjus jusjusjus deleted the stream_recording branch September 3, 2020 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants