Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't reproduce paper #142

Open
dkun7944 opened this issue Sep 23, 2024 · 5 comments
Open

Can't reproduce paper #142

dkun7944 opened this issue Sep 23, 2024 · 5 comments

Comments

@dkun7944
Copy link

I'm trying to replicate the ICASSP 2022 paper result (A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation).

Having some trouble getting the model to converge. I used all the hyperparameters mentioned in the paper, but the loss seems to plateau at around ~0.35 for note loss, ~0.4 for onset loss, and ~0.3 for contour loss. The code has a learning rate scheduler but that doesn't seem to help – eventually the early stopping gets hit.

I've tried:

  • adjusting the learning rate up and down by an order of magnitude
  • training on subsets of the total dataset (only maestro, only guitarset, etc)
  • training on CPU (M1 Max) and GPU (1x A10)
  • training with and without contours

But all yield the same result.

The paper mentions a weighted binary crossentropy:

"Binary cross entropy is used as the loss function for each
output, and the total loss is the sum of the three losses. However,
for Yo, there is a heavy class imbalance that drives models to output
Yo = 0 everywhere. As a countermeasure, we use a class-balanced
cross entropy loss, where the weight for the negative class is 0.05
and the positive is 0.95"

So I also enabled this in the training arguments. I had to fix the weighted_transcription_loss function in models.py since it was outputting the wrong dimension. I'm about 95% sure I got it right:

def weighted_transcription_loss(
    y_true: tf.Tensor, y_pred: tf.Tensor, label_smoothing: float, positive_weight: float = 0.5
) -> tf.Tensor:
    """The transcription loss where the positive and negative true labels are balanced by a weighting factor.

    Args:
        y_true: The true labels.
        y_pred: The predicted labels.
        label_smoothing: Smoothing factor. Squeezes labels towards 0.5.
        positive_weight: Weighting factor for the positive labels.

    Returns:
        The weighted transcription loss.
    """
    y_true_weighted = tf.where(tf.equal(y_true, 1), positive_weight, 1 - positive_weight)
    bce = tf.keras.losses.binary_crossentropy(y_true_weighted, y_pred, label_smoothing=label_smoothing)
    
    return bce

But regardless of whether I use weighted or unweighted loss, I get the same result.

Any advice?

@dkun7944
Copy link
Author

dkun7944 commented Oct 2, 2024

@drubinstein ?

@drubinstein
Copy link
Contributor

@rabitt

@dkun7944
Copy link
Author

dkun7944 commented Oct 2, 2024

Just found @bgenchel's tensorboard screenshot (#136) where the total loss converges at ~0.99. This is basically the result I'm getting, so maybe code is working as intended? I guess I expected the loss to go lower

@bgenchel
Copy link
Collaborator

bgenchel commented Oct 4, 2024

Hey Daniel, one possible explanation for your results (and mine) is that we are both not training on the full set of data used in the paper, some constituents of which are unavailable publicly. Could you list which of the datasets you're currently using to train?

@dkun7944
Copy link
Author

dkun7944 commented Oct 4, 2024

@bgenchel I have tried Maestro and GuitarSet. Did not realize the paper used non-public data. What proportion of the training set in the paper is proprietary? My goal is to train on a self generated synthetic dataset, but just want to validate the training code is working properly beforehand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants