Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong checkpoints loaded when multiple projects exists in the artifact_library #6

Open
a-akram opened this issue Dec 16, 2022 · 0 comments

Comments

@a-akram
Copy link
Collaborator

a-akram commented Dec 16, 2022

I have found strange behaviour of TrainTrack when we resume training from a checkpoint. Let's say we have two projects: GNNStudy and DNNStudy and we ran our pipeline once for each project. So we will have GNNStudy/version_0 and DNNStudy/version_0 for checkpoints in artifact_library: lightning_models/lightning_checkpoints.

If I resume my training for GNNStudy with resume_id: version_0 then TrainTrack sometimes jumps to DNNStudy/version_0 rather than GNNStudy/version_0. Seems like laod_config() uses os.walk from artifact_library as root and it finds version_0 that it encounters first. Maybe one should add a path like this artifact_library/project to search for a specific run where project: GNNStudy/DNNStudy comes from model_config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant