Wrong checkpoints loaded when multiple projects exists in the artifact_library #6

a-akram · 2022-12-16T16:34:11Z

I have found strange behaviour of TrainTrack when we resume training from a checkpoint. Let's say we have two projects: GNNStudy and DNNStudy and we ran our pipeline once for each project. So we will have GNNStudy/version_0 and DNNStudy/version_0 for checkpoints in artifact_library: lightning_models/lightning_checkpoints.

If I resume my training for GNNStudy with resume_id: version_0 then TrainTrack sometimes jumps to DNNStudy/version_0 rather than GNNStudy/version_0. Seems like laod_config() uses os.walk from artifact_library as root and it finds version_0 that it encounters first. Maybe one should add a path like this artifact_library/project to search for a specific run where project: GNNStudy/DNNStudy comes from model_config.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong checkpoints loaded when multiple projects exists in the artifact_library #6

Wrong checkpoints loaded when multiple projects exists in the artifact_library #6

a-akram commented Dec 16, 2022

Wrong checkpoints loaded when multiple projects exists in the artifact_library #6

Wrong checkpoints loaded when multiple projects exists in the artifact_library #6

Comments

a-akram commented Dec 16, 2022