CUDA out of memory #38

tomasonjo · 2021-12-30T17:18:06Z

I am using the cora link prediction as shown in the example colab: https://colab.research.google.com/drive/1ycdlJuse7l2De7wi51lFd_nCuaWgVABc?usp=sharing

Instead of using the cora dataset, I am using a subset of the pokec dataset with 1 million nodes and 10 million relationships. My nodes have two properties, so all in all it should work. My code is basically identical as the example, I only change the input graph that is created from a PyG graph:

args = {
    "device" : 'cuda' if torch.cuda.is_available() else 'cpu',
    "hidden_dim" : 128,
    "epochs" : 50,
}

#pyg_dataset = Planetoid('./tmp/cora', 'Cora')
graph = Graph.pyg_to_graph(pyg_graph)

dataset = GraphDataset(
        graph,
        task='link_pred',
        edge_train_mode="disjoint"
    )
datasets = {}
datasets['train'], datasets['val'], datasets['test']= dataset.split(
            transductive=True, split_ratio=[0.85, 0.05, 0.1])
input_dim = datasets['train'].num_node_features
num_classes = datasets['train'].num_edge_labels

model = LinkPredModel(input_dim, args["hidden_dim"]).to(args["device"])

optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)

dataloaders = {split: DataLoader(
            ds, collate_fn=Batch.collate([]),
            batch_size=1, shuffle=(split=='train'))
            for split, ds in datasets.items()}
best_model = train(model, dataloaders, optimizer, args)
log = "Train: {:.4f}, Val: {:.4f}, Test: {:.4f}"
best_train_roc = test(best_model, dataloaders['train'], args)
best_val_roc = test(best_model, dataloaders['val'], args)
best_test_roc = test(best_model, dataloaders['test'], args)
print(log.format(best_train_roc, best_val_roc, best_test_roc))

However I get the following error:

RuntimeError Traceback (most recent call last)
in
27 batch_size=1, shuffle=(split=='train'))
28 for split, ds in datasets.items()}
---> 29 best_model = train(model, dataloaders, optimizer, args)
30 log = "Train: {:.4f}, Val: {:.4f}, Test: {:.4f}"
31 best_train_roc = test(best_model, dataloaders['train'], args)

in train(model, dataloaders, optimizer, args)
9 model.train()
10 optimizer.zero_grad()
---> 11 pred = model(batch)
12 loss = model.loss(pred, batch.edge_label.type(pred.dtype))
13

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

in forward(self, batch)
19
20 nodes_first = torch.index_select(x, 0, edge_label_index[0,:].long())
---> 21 nodes_second = torch.index_select(x, 0, edge_label_index[1,:].long())
22 pred = torch.sum(nodes_first * nodes_second, dim=-1)
23 return pred

RuntimeError: CUDA out of memory. Tried to allocate 1.75 GiB (GPU 0; 8.00 GiB total capacity; 5.14 GiB already allocated; 281.56 MiB free; 5.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

When I had a similar issue in pytorch geometric, I just added the non_blocking parameter

graph.to(device, non_blocking=True)
but here it doesn't seem to help at all?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #38

CUDA out of memory #38

tomasonjo commented Dec 30, 2021

CUDA out of memory #38

CUDA out of memory #38

Comments

tomasonjo commented Dec 30, 2021