You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have found strange behaviour of TrainTrack when we resume training from a checkpoint. Let's say we have two projects: GNNStudy and DNNStudy and we ran our pipeline once for each project. So we will have GNNStudy/version_0 and DNNStudy/version_0 for checkpoints in artifact_library: lightning_models/lightning_checkpoints.
If I resume my training for GNNStudy with resume_id: version_0 then TrainTrack sometimes jumps to DNNStudy/version_0 rather than GNNStudy/version_0. Seems like laod_config() uses os.walk from artifact_library as root and it finds version_0 that it encounters first. Maybe one should add a path like this artifact_library/project to search for a specific run where project: GNNStudy/DNNStudy comes from model_config.
The text was updated successfully, but these errors were encountered:
I have found strange behaviour of TrainTrack when we resume training from a checkpoint. Let's say we have two projects: GNNStudy and DNNStudy and we ran our pipeline once for each project. So we will have
GNNStudy/version_0
andDNNStudy/version_0
for checkpoints inartifact_library: lightning_models/lightning_checkpoints
.If I resume my training for GNNStudy with
resume_id: version_0
then TrainTrack sometimes jumps toDNNStudy/version_0
rather thanGNNStudy/version_0
. Seems likelaod_config()
usesos.walk
fromartifact_library
as root and it findsversion_0
that it encounters first. Maybe one should add a path like thisartifact_library/project
to search for a specific run whereproject: GNNStudy/DNNStudy
comes frommodel_config
.The text was updated successfully, but these errors were encountered: