Incorrect positional argument #2

LongerZrLong · 2022-10-16T00:30:27Z

light-dist-gnn/coo_graph/parted_coo_graph.py

Line 92 in 65495aa

print(Parted_COO_Graph(self.name, i, num_parts, self.preprocess_for))

I try to run prepare_data.py and notice that the usage of positional argument here is incorrect. The correct way to invoke the function should be

Parted_COO_Graph(self.name, i, num_parts, preprocess_for=self.preprocess_for)

Otherwise, the self.preprocess_for will be passed to the device argument of Parted_CPP_Graph.

The text was updated successfully, but these errors were encountered:

BearBiscuit05 · 2023-03-13T01:01:17Z

Thank you for your issue, I also solved this problem, but I encountered another problem, when I run the program on 2 GPUs, it shows the following error, I don’t know how to solve it, if you run this program successfully, can you give some advice?

Traceback (most recent call last):
  File "/home/light-dist-gnn/main.py", line 37, in <module>
    torch.multiprocessing.spawn(process_wrapper, process_args, args.nprocs)
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/light-dist-gnn/main.py", line 24, in process_wrapper
    func(env, args)
  File "/home/light-dist-gnn/dist_train.py", line 71, in main
    train(g, env, total_epoch=args.epoch)
  File "/home/light-dist-gnn/dist_train.py", line 39, in train
    outputs = model(g.features)
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/light-dist-gnn/models/cached_gcn.py", line 105, in forward
    hidden_features = F.relu(DistGCNLayer.apply(features, self.weight1, self.g.adj_parts, 'L1'))
  File "/home/light-dist-gnn/models/cached_gcn.py", line 75, in forward
    z_local = cached_broadcast(adj_parts, features, 'Forward'+tag)
  File "/home/light-dist-gnn/models/cached_gcn.py", line 56, in cached_broadcast
    dist.broadcast(feature_bcast, src=src)
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1159, in broadcast
    work = default_pg.broadcast([tensor], opts)
RuntimeError: Tensors must be CUDA and dense

LongerZrLong · 2023-03-13T16:36:48Z

It has been a while since I last ran the code and I am not sure whether I ran it with GPUs or on with merely CPU. I would recommend to first run with only CPU to see if the code works since the issue from your Error Stack Trace is likely related to CUDA in torch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect positional argument #2

Incorrect positional argument #2

LongerZrLong commented Oct 16, 2022 •

edited

Loading

BearBiscuit05 commented Mar 13, 2023

LongerZrLong commented Mar 13, 2023

Incorrect positional argument #2

Incorrect positional argument #2

Comments

LongerZrLong commented Oct 16, 2022 • edited Loading

BearBiscuit05 commented Mar 13, 2023

LongerZrLong commented Mar 13, 2023

LongerZrLong commented Oct 16, 2022 •

edited

Loading