Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open source URMP dataset pipeline #458

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

copybara-service[bot]
Copy link
Contributor

Open source URMP dataset pipeline

PiperOrigin-RevId: 460754854
@cyrusvahidi
Copy link

looks good to me.

only thing I notice is that prepare_urmp_dataset_lib.parse_example expects TFRecords. Will these TFRecords or the code to produce them also be available?

tf.io.FixedLenSequenceFeature([],
dtype=tf.float32,
allow_missing=True),
'sequence':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is sequence extracted from the source URMP dataset? what is it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured you can just create a NoteSequence using URMP's notes files

Copy link

@cyrusvahidi cyrusvahidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug to fix.

Generally it would be helpful to have a script that extracts correctly formatted necessary features from the raw URMP dataset and dumps to TFRecord. I've done this and can make a new PR. Alternatively the 48 kHz tfrecord without the metadata and DDSP features could be uploaded somewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants