Skip to content

Latest commit

 

History

History
113 lines (82 loc) · 6.15 KB

README.md

File metadata and controls

113 lines (82 loc) · 6.15 KB

NoC Simulator for simulating intra-chip data flow in Neural Network Accelerator

Konkuk University (Kyle Jonghyuk Park, Ko Ryeowook)

Introduction

In cutting-edge neural network accelerators, our project analyzes SIMBA, NVIDIA's NN accelerator, and its Network on-Chip (NoC) structure. We aim to uncover its strengths and weaknesses while also addressing the need for advanced simulators. To this end, we have developed a unique simulator that supports both multicast and unicast data transmissions, filling a critical gap in existing tools. Our work contributes to the advancement of neural network accelerator research, enabling efficient data processing in the era of AI and deep learning.

Project Summary

  • Developed a 2D Mesh NoC simulator in Verilog to verify different tile structures and efficient dataflow of a neural network accelerator
    • Flit-based flow control: wormhole
    • Virtual Channel
    • Lookahead routing pipeline (4-cycle)
    • Credit-based buffer backpressure
  • Implemented the Based Routing Conformed Paths (BRCP) model to support both unicast and multicast
    • "Forward & Absorb" multicast mechanism
    • Can avoid multicast-unicast routing deadlock since multicast and unicast share the same network paths
    • Unicast vs Multicast: Mulitcast priority
  • Applied advanced multicast algorithm: Advanced Hierarchical Leader-Based (HL) scheme (NoC Simulator (HL))
    • More efficient in cycle than the original HL scheme
  • Parameterized the simulator's options to improve usability (define.v, parameters.v)
    • Routing Algorithm: XY / YX DOR
    • Arbiter Configurations: Fixed priority / Round robin priority
    • Priority Configuration
    • Flit Configuration

Block Diagram

Top

top

Router

router

  • Virtual channel used (Logical data path)

Routing Computation & Crossbar

rc cb

  • Forward & Absorb

  • Determine the status of the input packet at the routing computation logic and send it to the mux controller in crossbar

    • 3 Status: Unicast / Multicast & Forward / Multicast & Absorb

    status

  • Depending on the status of the packet, the crossbar behaves differently

  • Contention between multicast & absorb packet and other packet is well handled by the multab_ct signal

PE Cycle module

simba pe2 pe cycle

Left: SIMBA proposed PE archiecture, Right: PE cycle module

  • PE cycle module is only used for cycle simulation
  • 1 MAC: 8 cycles, total 128 MAC: 1024 cycles
    • Refer to SIMBA paper: ResNet-50 (res4a_branch1)
  • If you don't need a cycle simulation for PE computation, you can delete this module. The NoC Simulator will still work.

Original HL scheme

original hl

  • Multicast algorithm for sending data from one source to multiple destinations
  • The HL scheme is proposed by a paper that proposed BRCP model ("Multidestination message passing in wormhole k-ary n-cube networks with base routing conformed paths")

Our Advanced HL scheme

advanced hl

  1. Divide the Mesh NoC into four quadrants and determine L1, L2 with the algorithm specified for the quadrant where the source is located.
  2. Through U-mesh algorithm, send data with one L2 as the first destination among multiple L2s
  3. Proceed multicast in the specified col and row directions based on the quadrant where the source is located.

Advanced HL vs Original HL

table graph

Simulation Result

Case 1: NoC Simulator (HL)

Simulation Scenario

case1

Case 1 result

result

wave

Case 2: NoC Simulator (2VC, PE cycle)

Simulation Scenario

case2

  • Contention between row-by-row multicast, and partial-sum (PSUM) unicast transmissions sent after PE internal MAC operations during SIMBA dataflow.

Case 2 result

Row 2: IA Multicast & PE MAC cycle

case1

Row 3: IA Multicast & PSUM contention

case1

Simulator log and options

result2

  • Simulation logs & Simulator Configuration options (define.v, parameters.v)

Reference

  • Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture LINK
  • D. K. Panda, S. Singal and R. Kesavan, "Multidestination message passing in wormhole k-ary n-cube networks with base routing conformed paths," in IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 1, pp. 76-96, Jan. 1999, doi: 10.1109/71.744844. LINK
  • NoCGEN: "An open-source on-chip router model originally developed for [Matsutani_HPCA09]" LINK

Helpful books

  • On-Chip Networks, Second Edition (Natalie Enright Jerger, Tushar Krishna, Li-Shiuan Peh)
  • Principles and Practices of Interconnection Networks (William James Dally, Brian Towles)