training using custom dataset stops with error and the intermediate result is quite noisy #40

emjay73 · 2024-08-15T16:28:38Z

Training using a custom dataset stops with an error and the intermediate result is quite noisy.
It seems like the loss is having a hard time oscillating near 0.3, not decreasing.
Following is my intermediate result.
Any suggestions?

And this is the error that I encountered.

329/500 [3:17:03<1:42:25, 35.94s/it]
Traceback (most recent call last):
  File "~/shape-of-motion/run_training.py", line 254, in <module>
    main(tyro.cli(TrainConfig))
  File "~/shape-of-motion/run_training.py", line 152, in main
    loss = trainer.train_step(batch)
  File "~/shape-of-motion/flow3d/trainer.py", line 170, in train_step
    loss, stats, num_rays_per_step, num_rays_per_sec = self.compute_losses(batch)
  File "~/shape-of-motion/flow3d/trainer.py", line 376, in compute_losses
    track_2d_loss = masked_l1_loss(
  File "~/shape-of-motion/flow3d/loss_utils.py", line 32, in masked_l1_loss
    (sum_loss < torch.quantile(sum_loss, quantile)).squeeze(-1)
RuntimeError: quantile() input tensor must be non-empty

The text was updated successfully, but these errors were encountered:

zhengmiao1996 · 2024-08-27T11:18:37Z

Training using a custom dataset stops with an error and the intermediate result is quite noisy. It seems like the loss is having a hard time oscillating near 0.3, not decreasing. Following is my intermediate result. Any suggestions?

And this is the error that I encountered.

329/500 [3:17:03<1:42:25, 35.94s/it]
Traceback (most recent call last):
  File "~/shape-of-motion/run_training.py", line 254, in <module>
    main(tyro.cli(TrainConfig))
  File "~/shape-of-motion/run_training.py", line 152, in main
    loss = trainer.train_step(batch)
  File "~/shape-of-motion/flow3d/trainer.py", line 170, in train_step
    loss, stats, num_rays_per_step, num_rays_per_sec = self.compute_losses(batch)
  File "~/shape-of-motion/flow3d/trainer.py", line 376, in compute_losses
    track_2d_loss = masked_l1_loss(
  File "~/shape-of-motion/flow3d/loss_utils.py", line 32, in masked_l1_loss
    (sum_loss < torch.quantile(sum_loss, quantile)).squeeze(-1)
RuntimeError: quantile() input tensor must be non-empty

Then python run_training.py --work-dir ./outdir data:custom --data.seq-name seq1 --data.root-dir /mnt/SOM/data/
After training, I only get the a result checkpoint without anything else

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training using custom dataset stops with error and the intermediate result is quite noisy #40

training using custom dataset stops with error and the intermediate result is quite noisy #40

emjay73 commented Aug 15, 2024 •

edited

Loading

zhengmiao1996 commented Aug 27, 2024

training using custom dataset stops with error and the intermediate result is quite noisy #40

training using custom dataset stops with error and the intermediate result is quite noisy #40

Comments

emjay73 commented Aug 15, 2024 • edited Loading

zhengmiao1996 commented Aug 27, 2024

emjay73 commented Aug 15, 2024 •

edited

Loading