Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: Can't pickle local object 'partialclass.<locals>.NewCls' #54

Open
adeelferozmirza opened this issue Jun 28, 2024 · 4 comments

Comments

@adeelferozmirza
Copy link

Hey when i run it on windows , i got this error , how to solve it can u help please

F:\SSCAmea\RVT-master>python train.py model=rnndet dataset=gen1 dataset.path=F:\SSCAmea\gen1 wandb.project_name=RVT wandb.group_name=gen1 +experiment/gen1=base.yaml hardware.gpus=0 batch_size.train=8 batch_size.eval=8 hardware.num_workers.train=6 hardware.num_workers.eval=2
Using python-based detection evaluation
Set MaxViTRNN backbone (height, width) to (256, 320)
Set partition sizes: (8, 10)
Set num_classes=2 for detection head
------ Configuration ------
reproduce:
seed_everything: null
deterministic_flag: false
benchmark: false
training:
precision: 16
max_epochs: 10000
max_steps: 400000
learning_rate: 0.0002
weight_decay: 0
gradient_clip_val: 1.0
limit_train_batches: 1.0
lr_scheduler:
use: true
total_steps: ${..max_steps}
pct_start: 0.005
div_factor: 20
final_div_factor: 10000
validation:
limit_val_batches: 1.0
val_check_interval: null
check_val_every_n_epoch: 1
batch_size:
train: 8
eval: 8
hardware:
num_workers:
train: 6
eval: 2
gpus: 0
dist_backend: nccl
logging:
ckpt_every_n_epochs: 1
train:
metrics:
compute: false
detection_metrics_every_n_steps: null
log_model_every_n_steps: 5000
log_every_n_steps: 500
high_dim:
enable: true
every_n_steps: 5000
n_samples: 4
validation:
high_dim:
enable: true
every_n_epochs: 1
n_samples: 8
wandb:
wandb_runpath: null
artifact_name: null
artifact_local_file: null
resume_only_weights: false
group_name: gen1
project_name: RVT
dataset:
name: gen1
path: F:\SSCAmea\gen1
train:
sampling: mixed
random:
weighted_sampling: false
mixed:
w_stream: 1
w_random: 1
eval:
sampling: stream
data_augmentation:
random:
prob_hflip: 0.5
rotate:
prob: 0
min_angle_deg: 2
max_angle_deg: 6
zoom:
prob: 0.8
zoom_in:
weight: 8
factor:
min: 1
max: 1.5
zoom_out:
weight: 2
factor:
min: 1
max: 1.2
stream:
prob_hflip: 0.5
rotate:
prob: 0
min_angle_deg: 2
max_angle_deg: 6
zoom:
prob: 0.5
zoom_out:
factor:
min: 1
max: 1.2
ev_repr_name: stacked_histogram_dt=50_nbins=10
sequence_length: 21
resolution_hw:

  • 240
  • 304
    downsample_by_factor_2: false
    only_load_end_labels: false
    model:
    name: rnndet
    backbone:
    name: MaxViTRNN
    compile:
    enable: false
    args:
    mode: reduce-overhead
    input_channels: 20
    enable_masking: false
    partition_split_32: 1
    embed_dim: 64
    dim_multiplier:
    • 1
    • 2
    • 4
    • 8
      num_blocks:
    • 1
    • 1
    • 1
    • 1
      T_max_chrono_init:
    • 4
    • 8
    • 16
    • 32
      stem:
      patch_size: 4
      stage:
      downsample:
      type: patch
      overlap: true
      norm_affine: true
      attention:
      use_torch_mha: false
      partition_size:
      • 8
      • 10
        dim_head: 32
        attention_bias: true
        mlp_activation: gelu
        mlp_gated: false
        mlp_bias: true
        mlp_ratio: 4
        drop_mlp: 0
        drop_path: 0
        ls_init_value: 1.0e-05
        lstm:
        dws_conv: false
        dws_conv_only_hidden: true
        dws_conv_kernel_size: 3
        drop_cell_update: 0
        in_res_hw:
    • 256
    • 320
      fpn:
      name: PAFPN
      compile:
      enable: false
      args:
      mode: reduce-overhead
      depth: 0.67
      in_stages:
    • 2
    • 3
    • 4
      depthwise: false
      act: silu
      head:
      name: YoloX
      compile:
      enable: false
      args:
      mode: reduce-overhead
      depthwise: false
      act: silu
      num_classes: 2
      postprocess:
      confidence_threshold: 0.1
      nms_threshold: 0.45

Disabling PL seed everything because of unresolved issues with shuffling during training on streaming datasets
new run: generating id ba4dy0ts
wandb: Currently logged in as: adeelferozmirza1 (adeelferozmirza). Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.17.3
wandb: Run data is saved locally in F:\SSCAmea\RVT-master\wandb\run-20240627_152506-ba4dy0ts
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run golden-feather-1
wandb: View project at https://wandb.ai/adeelferozmirza/RVT
wandb: View run at https://wandb.ai/adeelferozmirza/RVT/runs/ba4dy0ts
wandb: logging graph, to disable use wandb.watch(log_graph=False)
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default ModelSummary callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used..
Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used..
[Train] Local batch size for:
stream sampling: 4
random sampling: 4
[Train] Local num workers for:
stream sampling: 3
random sampling: 3
creating rnd access train datasets: 1458it [00:03, 419.82it/s]
creating streaming train datasets: 1458it [00:09, 160.31it/s]
num_full_sequences=317
num_splits=1141
num_split_sequences=5492
creating streaming val datasets: 429it [00:01, 399.27it/s]
num_full_sequences=429
num_splits=0
num_split_sequences=0
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | mdl | YoloXDetector | 18.5 M
1 | mdl.backbone | RNNDetector | 12.8 M
2 | mdl.fpn | YOLOPAFPN | 3.9 M
3 | mdl.yolox_head | YOLOXHead | 1.9 M

18.5 M Trainable params
0 Non-trainable params
18.5 M Total params
37.073 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 20 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
Using python-based detection evaluation
Using python-based detection evaluation
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3588.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Epoch 0: : 0it [00:00, ?it/s]Using python-based detection evaluation
Using python-based detection evaluation
Using python-based detection evaluation
== Timing statistics ==
== Timing statistics ==
Error executing job with overrides: ['model=rnndet', 'dataset=gen1', 'dataset.path=F:\SSCAmea\gen1', 'wandb.project_name=RVT', 'wandb.group_name=gen1', '+experiment/gen1=base.yaml', 'hardware.gpus=0', 'batch_size.train=8', 'batch_size.eval=8', 'hardware.num_workers.train=6', 'hardware.num_workers.eval=2']
Traceback (most recent call last):
File "F:\SSCAmea\RVT-master\train.py", line 138, in main
trainer.fit(model=module, ckpt_path=ckpt_path, datamodule=data_module)
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 603, in fit
call._call_and_handle_interrupt(
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 645, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1098, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1177, in _run_stage
self._run_train()
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1200, in _run_train
self.fit_loop.run()
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\loop.py", line 194, in run
self.on_run_start(*args, **kwargs)
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 161, in on_run_start
_ = iter(data_fetcher) # creates the iterator inside the fetcher
^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\utilities\fetching.py", line 179, in iter
self._apply_patch()
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\utilities\fetching.py", line 120, in _apply_patch
apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn)
^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\utilities\fetching.py", line 156, in loader_iters
return self.dataloader_iter.loader_iters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\supporters.py", line 555, in loader_iters
self._loader_iters = self.create_loader_iters(self.loaders)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\supporters.py", line 595, in create_loader_iters
return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\lightning_utilities\core\apply_func.py", line 52, in apply_to_collection
return _apply_to_collection_slow(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\lightning_utilities\core\apply_func.py", line 104, in _apply_to_collection_slow
v = _apply_to_collection_slow(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\lightning_utilities\core\apply_func.py", line 96, in _apply_to_collection_slow
return function(data, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\pytorch_lightning\trainer\supporters.py", line 177, in iter
self._loader_iter = iter(self.loader)
^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter
return self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\dataloader.py", line 1040, in init
w.start()
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\popen_spawn_win32.py", line 95, in init
reduction.dump(process_obj, to_child)
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 172, in reduce_ex
return super().reduce_ex(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 347, in getstate
value = pickle.dumps(self._datapipe)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object 'partialclass..NewCls'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Using python-based detection evaluation
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adeel\anaconda3\envs\events_signals\Lib\multiprocessing\spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input
== Timing statistics ==
wandb: View run golden-feather-1 at: https://wandb.ai/adeelferozmirza/RVT/runs/ba4dy0ts
wandb: View project at: https://wandb.ai/adeelferozmirza/RVT
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: .\wandb\run-20240627_152506-ba4dy0ts\logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with wandb.require("core")! See https://wandb.me/wandb-core for more information.
== Timing statistics ==
Epoch 0: : 0it [00:28, ?it/s]

@magehrig
Copy link
Contributor

magehrig commented Jul 8, 2024

This appears to be a limitation of pickle and windows. Specifically, here we dynamically define/overwrite the init function inside a class and pickle (in windows) does not like that. So I think the following should work instead (replace the above linked function with the following one):

def partialclass(cls, *args, **kwargs):
    class NewCls(cls):
        def __init__(self, *more_args, **more_kwargs):
            full_args = args + more_args
            full_kwargs = {**kwargs, **more_kwargs}
            super().__init__(*full_args, **full_kwargs)

    return NewCls

I have not tested it but this should work.

This should also resolve #45

@ACC-Tony
Copy link

ACC-Tony commented Aug 6, 2024

I've also encountered this problem when running this project via PyCharm on Windows.

I've already modified the code in 'data/genx_utils/dataset_streaming.py':
image
However, the problem remains.

Is it because of pickle cannot work with lexical closures?
Is this project working well with Linux? I may move on in linux.

The command I use is:

python .\train.py model=rnndet dataset=gen1 dataset.path=V:/gen1/ wandb.project_name=RVT wandb.group_name=gen1 +experiment/gen1="tiny.yaml" hardware.gpus=[0] batch_size.train=16 batch_size.eval=16 hardware.num_workers.train=6 hardware.num_workers.eval=6

And the log in the console is:

Using cpp-based detection evaluation
Set MaxViTRNN backbone (height, width) to (256, 320)
Set partition sizes: (8, 10)
Set num_classes=2 for detection head
------ Configuration ------
reproduce:
  seed_everything: null
  deterministic_flag: false
  benchmark: false
training:
  precision: 16
  max_epochs: 10000
  max_steps: 400000
  learning_rate: 0.0002
  weight_decay: 0
  gradient_clip_val: 1.0
  limit_train_batches: 1.0
  lr_scheduler:
    use: true
    total_steps: ${..max_steps}
    pct_start: 0.005
    div_factor: 20
    final_div_factor: 10000
validation:
  limit_val_batches: 1.0
  val_check_interval: null
  check_val_every_n_epoch: 1
batch_size:
  train: 16
  eval: 16
hardware:
  num_workers:
    train: 6
    eval: 6
  gpus:
  - 0
  dist_backend: nccl
logging:
  ckpt_every_n_epochs: 1
  train:
    metrics:
      compute: false
      detection_metrics_every_n_steps: null
    log_model_every_n_steps: 5000
    log_every_n_steps: 500
    high_dim:
      enable: true
      every_n_steps: 5000
      n_samples: 4
  validation:
    high_dim:
      enable: true
      every_n_epochs: 1
      n_samples: 8
wandb:
  wandb_runpath: null
  artifact_name: null
  artifact_local_file: null
  resume_only_weights: false
  group_name: gen1
  project_name: RVT
dataset:
  name: gen1
  path: V:/gen1/
  train:
    sampling: mixed
    random:
      weighted_sampling: false
    mixed:
      w_stream: 1
      w_random: 1
  eval:
    sampling: stream
  data_augmentation:
    random:
      prob_hflip: 0.5
      rotate:
        prob: 0
        min_angle_deg: 2
        max_angle_deg: 6
      zoom:
        prob: 0.8
        zoom_in:
          weight: 8
          factor:
            min: 1
            max: 1.5
        zoom_out:
          weight: 2
          factor:
            min: 1
            max: 1.2
    stream:
      prob_hflip: 0.5
      rotate:
        prob: 0
        min_angle_deg: 2
        max_angle_deg: 6
      zoom:
        prob: 0.5
        zoom_out:
          factor:
            min: 1
            max: 1.2
  ev_repr_name: stacked_histogram_dt=50_nbins=10
  sequence_length: 21
  resolution_hw:
  - 240
  - 304
  downsample_by_factor_2: false
  only_load_end_labels: false
model:
  name: rnndet
  backbone:
    name: MaxViTRNN
    compile:
      enable: false
      args:
        mode: reduce-overhead
    input_channels: 20
    enable_masking: false
    partition_split_32: 1
    embed_dim: 32
    dim_multiplier:
    - 1
    - 2
    - 4
    - 8
    num_blocks:
    - 1
    - 1
    - 1
    - 1
    T_max_chrono_init:
    - 4
    - 8
    - 16
    - 32
    stem:
      patch_size: 4
    stage:
      downsample:
        type: patch
        overlap: true
        norm_affine: true
      attention:
        use_torch_mha: false
        partition_size:
        - 8
        - 10
        dim_head: 32
        attention_bias: true
        mlp_activation: gelu
        mlp_gated: false
        mlp_bias: true
        mlp_ratio: 4
        drop_mlp: 0
        drop_path: 0
        ls_init_value: 1.0e-05
      lstm:
        dws_conv: false
        dws_conv_only_hidden: true
        dws_conv_kernel_size: 3
        drop_cell_update: 0
    in_res_hw:
    - 256
    - 320
  fpn:
    name: PAFPN
    compile:
      enable: false
      args:
        mode: reduce-overhead
    depth: 0.33
    in_stages:
    - 2
    - 3
    - 4
    depthwise: false
    act: silu
  head:
    name: YoloX
    compile:
      enable: false
      args:
        mode: reduce-overhead
    depthwise: false
    act: silu
    num_classes: 2
  postprocess:
    confidence_threshold: 0.1
    nms_threshold: 0.45

---------------------------
Disabling PL seed everything because of unresolved issues with shuffling during training on streaming datasets
new run: generating id 56gjza94
wandb: Currently logged in as: ***(***). Use `wandb login --relogin` to force relogin
wandb: wandb version 0.17.5 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.14.0
wandb: Run data is saved locally in D:\RVT-master\wandb\run-20240806_104650-56gjza94
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run gallant-elevator-26
wandb:  View project at https://wandb.ai/***/RVT
wandb:  View run at https://wandb.ai/***/RVT/runs/56gjza94
wandb: logging graph, to disable use `wandb.watch(log_graph=False)`
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
[Train] Local batch size for:
stream sampling:        8
random sampling:        8
[Train] Local num workers for:
stream sampling:        3
random sampling:        3
creating rnd access train datasets: 1458it [00:42, 34.05it/s]
creating streaming train datasets: 1458it [02:16, 10.66it/s]
num_full_sequences=317
num_splits=1141
num_split_sequences=5492
creating streaming val datasets: 429it [00:18, 23.08it/s]
num_full_sequences=429
num_splits=0
num_split_sequences=0
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type          | Params
-------------------------------------------------
0 | mdl            | YoloXDetector | 4.4 M
1 | mdl.backbone   | RNNDetector   | 3.2 M
2 | mdl.fpn        | YOLOPAFPN     | 710 K
3 | mdl.yolox_head | YOLOXHead     | 474 K
-------------------------------------------------
4.4 M     Trainable params
0         Non-trainable params
4.4 M     Total params
8.810     Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Using cpp-based detection evaluation
Sanity Checking DataLoader 0:   0%|                                                                                                                                                                          | 0/2 [00:00<?, ?it/s]C
:\Users\***\.conda\envs\rvt\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Epoch 0: : 0it [00:00, ?it/s]Using cpp-based detection evaluation                                                                                                                                                                  
Using cpp-based detection evaluation
Using cpp-based detection evaluation
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
== Timing statistics ==
Error executing job with overrides: ['model=rnndet', 'dataset=gen1', 'dataset.path=V:/gen1/', 'wandb.project_name=RVT', 'wandb.group_name=gen1', '+experiment/gen1=tiny.yaml', 'hardware.gpus=[0]', 'batch_size.train=16', 'batch_size.eval=16', 'hardware.num_workers.train=6', 'hardware.num_workers.eval=6']
Traceback (most recent call last):
  File "D:\RVT-master\train.py", line 138, in main
    trainer.fit(model=module, ckpt_path=ckpt_path, datamodule=data_module)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1098, in _run
    results = self._run_stage()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1177, in _run_stage
    self._run_train()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1200, in _run_train
    self.fit_loop.run()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\loop.py", line 194, in run
    self.on_run_start(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 161, in on_run_start
    _ = iter(data_fetcher)  # creates the iterator inside the fetcher
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 179, in __iter__
    self._apply_patch()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 120, in _apply_patch
    apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 156, in loader_iters
    return self.dataloader_iter.loader_iters
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 555, in loader_iters
    self._loader_iters = self.create_loader_iters(self.loaders)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 595, in create_loader_iters
    return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\lightning_utilities\core\apply_func.py", line 52, in apply_to_collection
    return _apply_to_collection_slow(
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\lightning_utilities\core\apply_func.py", line 104, in _apply_to_collection_slow
    v = _apply_to_collection_slow(
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\lightning_utilities\core\apply_func.py", line 96, in _apply_to_collection_slow
    return function(data, *args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 177, in __iter__
    self._loader_iter = iter(self.loader)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\dataloader.py", line 442, in __iter__
    return self._get_iterator()
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\dataloader.py", line 1043, in __init__
    w.start()
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 167, in __reduce_ex__
    return super().__reduce_ex__(*args, **kwargs)
  File "C:\Users\***\.conda\envs\rvt\lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 333, in __getstate__
    value = pickle.dumps(self._datapipe)
AttributeError: Can't pickle local object 'build_streaming_train_dataset.<locals>.partialclass'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.
Using cpp-based detection evaluation
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\***\.conda\envs\rvt\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
== Timing statistics ==
wandb:  View run gallant-elevator-26 at: https://wandb.ai/***/RVT/runs/56gjza94
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: .\wandb\run-20240806_104650-56gjza94\logs
== Timing statistics ==
Epoch 0: : 0it [00:26, ?it/s]

@magehrig
Copy link
Contributor

Unfortunately, I can't test this because I don't have a windows machine. Yes, the project was written and tested in Linux (specifically Ubuntu) so moving to Linux is one solution.

@halotyy
Copy link

halotyy commented Oct 18, 2024

I have the same problem, is there a solution now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants