diff --git a/examples/sampling/graphbolt/rgcn/README.md b/examples/sampling/graphbolt/rgcn/README.md
index b4948ad4bc2f..cb3e5af8969a 100644
--- a/examples/sampling/graphbolt/rgcn/README.md
+++ b/examples/sampling/graphbolt/rgcn/README.md
@@ -47,10 +47,15 @@ python3 hetero_rgcn.py --dataset ogb-lsc-mag240m --num_gpus 1
 ### Resource usage and time cost
 Below results are roughly collected from an AWS EC2 **g4dn.metal**, 384GB RAM, 96 vCPUs(Cascade Lake P-8259L), 8 NVIDIA T4 GPUs(16GB RAM). CPU RAM usage is the peak value of `used` field of `free` command which is a bit rough. Please refer to `RSS`/`USS`/`PSS` which are more accurate. GPU RAM usage is the peak value recorded by `nvidia-smi` command.
 
+> **note:**
+`buffer/cache` are highly used during train, it's about 300GB. If more RAM is available, more `buffer/cache` will be consumed as graph size is about 55GB and feature data is about 350GB.
+One more thing, first epoch is quite slow as `buffer/cache` is not ready yet. For GPU train, first epoch takes **34:56min, 1.93s/it**.
+Even in following epochs, time consumption varies.
+
 | Dataset Size | CPU RAM Usage | Num of GPUs | GPU RAM Usage | Time Per Epoch(Training) | Time Per Epoch(Inference: train/val/test set)      |
 | ------------ | ------------- | ----------- | ---------- | --------- | ---------------------------    |
 | ~404GB       | ~55GB       | 0           |  0GB       | ~3min25s(1087it, 5.29it/s)  | ~2min26s(272it, 1.86it/s) + ~0min20s(34it, 1.62it/s) + ~0min13s(22it, 1.68it/s)   |
-| ~404GB       | ~55GB       | 1           |  7GB       | ~2min41s(1087it, 6.73it/s)  | ~1min52s(272it, 2.41it/s) + ~0min17s(34it, 1.93it/s) + ~0min11s(22it, 1.99it/s)  |
+| ~404GB       | ~55GB       | 1           |  7GB       | ~1min59s(1087it, 9.11it/s)  | ~1min52s(272it, 2.41it/s) + ~0min17s(34it, 1.93it/s) + ~0min11s(22it, 1.99it/s)  |
 
 ### Accuracies
 ```