You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run into the following issue in the catalyst cluster:
$ python demo/demo_chameleon.py
========== Search Configuration ==========
max num threadblock graph op: 9
max num kernel_graph op: 7
max num threadblock graphs: 1
max num threadblock graph inputs: 3
max num threadblock graph outputs: 2
search_thread: 8
imaps to explore:
imap combs to explore:
omaps to explore:
grid dims to explore:
block dims to explore:
fmaps to explore:
franges to explore:4 16 64
[Search] States: 901, Random tests: 1, Valid mugraphs: 0, Time: 4.766257
[Search] First step finished. Time elapsed: 4.766711sec
[Search] States: 743301, Random tests: 4307, Valid mugraphs: 16, Time: 172.965338
[Search] Second step finished. Time elapsed: 172.990405sec
[Search] Total states explored: 743323
[Search] Random tests performed: 4307
[Serach] Valid kernel graphs explored: 16
Transpiling muGraph 0...
Profiling muGraph 0 performance (ms) = 0.06813900756835937
Transpiling muGraph 1...
Profiling muGraph 1 performance (ms) = 0.06811238098144531
Transpiling muGraph 2...
python: /home/mengdiwu/mirage/src/threadblock/element_binary.cc:51: mirage::threadblock::STensor mirage::threadblock::Graph::elementbinary(const mirage::threadblock::STensor&, const mirage::threadblock::STensor&, mirage::type::TBOperatorType): Assertion `op != nullptr' failed.
Aborted (core dumped)
A potential reason is the inconsistency between memory checking in search and in transpiler: we modify the graph before transpilation, which may increase the memory usage, and then cause oom issue.
The text was updated successfully, but these errors were encountered:
I run into the following issue in the catalyst cluster:
A potential reason is the inconsistency between memory checking in search and in transpiler: we modify the graph before transpilation, which may increase the memory usage, and then cause oom issue.
The text was updated successfully, but these errors were encountered: