Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GL simulation failing #252

Open
mole99 opened this issue Nov 21, 2024 · 5 comments
Open

GL simulation failing #252

mole99 opened this issue Nov 21, 2024 · 5 comments

Comments

@mole99
Copy link
Contributor

mole99 commented Nov 21, 2024

Hello, I've been trying to get gate level simulation working with FABulous but have been unsuccessful so far. My goal is to run the same simulation that is generated in the fabric under the Test/ folder, but by substituting the RTL of the LUT4AB with its GL representation (as a start).

But when I do that, the output of the fabric no longer matches the golden reference:

...
fabric(I_top) = 0xXXxxxxX gold = 0x48c017c, fabric(T_top) = 0xffffffe gold = 0xffffffe
fabric(I_top) = 0xXXxxxxX gold = 0x48c0180, fabric(T_top) = 0xffffffe gold = 0xffffffe
fabric(I_top) = 0xXXxxxxX gold = 0x48c0184, fabric(T_top) = 0xffffffe gold = 0xffffffe
fabric(I_top) = 0xXXxxxxX gold = 0x48c0188, fabric(T_top) = 0xffffffe gold = 0xffffffe
fabric(I_top) = 0xXXxxxxX gold = 0x48c018c, fabric(T_top) = 0xffffffe gold = 0xffffffe
FATAL: tmp/fabulous_tb.v:88: 
       Time: 207060000  Scope: fab_tb

Reproducing the Issue

Simply create a new fabric with default settings using the main branch of FABulous (the development branch gives the same result). Use the latest version of the sky130 PDK and OpenLane 2 and harden the LUT4AB tile into a macro (skip STA).

The OpenLane 2 configuration is

{
  "meta": {
    "version": 2,
    "flow": "Classic",
    "substituting_steps": {
      "OpenROAD.STAPrePNR": null,
      "OpenROAD.STAMidPNR*": null,
      "OpenROAD.STAPostPNR": null
    }
  },
  "DESIGN_NAME": "LUT4AB",
  "VERILOG_FILES": [
      "dir::LUT4AB.v",
      "dir::LUT4AB_ConfigMem.v",
      "dir::LUT4AB_switch_matrix.v",
      "dir::LUT4c_frame_config_dffesr.v",
      "dir::MUX8LUT_frame_config_mux.v",
      "dir::../../Fabric/models_pack.v",
      "dir::../../custom.v"
  ],
  "CLOCK_PERIOD": 25,
  "CLOCK_PORT": "UserCLK",
  "FP_SIZING": "absolute",
  "DIE_AREA": [0, 0, 1000, 1000]
}

custom.v simply contains:

module clk_buf(input A, output X);
assign X = A;
endmodule

module break_comb_loop(input A, output X);
assign X = A;
endmodule

After OpenLane 2 has finished, copy the final GL netlist (LUT4AB.nl.v) to the Test/tmp directory (make sure tmp is not deleted after the last simulation run) and delete the RTL files for LUT4AB. The script run_simulation.sh needs to be updated to load the stdcells and set FUNCTIONAL=1 and UNIT_DELAY=#0:

#!/usr/bin/env bash
set -ex
DESIGN=counter
BITSTREAM=test_design/${DESIGN}.bin
VERILOG=../../fabric_generator/verilog_output
MAX_BITBYTES=16384

#rm -rf tmp
#mkdir tmp
#for i in $(find ../Tile -type f -name "*.v") $(find ../Fabric -type f -name "*.v")
#do 
#    cp $i tmp/
#done

iverilog -s fab_tb -o fab_tb.vvp tmp/* test_design/${DESIGN}.v fabulous_tb.v ${PDK_ROOT}/${PDK}/libs.ref/sky130_fd_sc_hd/verilog/sky130_fd_sc_hd.v ${PDK_ROOT}/${PDK}/libs.ref/sky130_fd_sc_hd/verilog/primitives.v -D FUNCTIONAL=1 -D UNIT_DELAY=#0
python3 makehex.py $BITSTREAM $MAX_BITBYTES bitstream.hex
vvp fab_tb.vvp
#rm -rf tmp

To quickly reproduce the issue, I have gathered all the required files and attached them to this report in a zip folder.

reproduce.zip

To run the simulation, unpack the zip file, cd into it and simply do:

iverilog -s fab_tb -o fab_tb.vvp tmp/* counter.v fabulous_tb.v -D FUNCTIONAL=1 -D UNIT_DELAY=#0
vvp fab_tb.vvp

And you should get the same output as above.


I assume the FABulous team has done GL simulations for the MPW submissions before? I would really appreciate it if you could give this a try and let me know what I'm doing wrong or whether there's an issue in FABulous.

Thanks!

@mole99
Copy link
Contributor Author

mole99 commented Dec 9, 2024

@KelvinChung2000 @IAmMarcelJung @EverythingElseWasAlreadyTaken

Hi, sorry to ping you, but I was wondering if one of you has an idea what could be the problem here?

I'm at the end of my rope, I'm literally using the upstream FABulous repository, converting the LUT4AB to a GL netlist and simulating it. This should work without any problems...

I've used the same approach on a larger RISC-V core and was able to simulate its GL netlist without issue.

@IAmMarcelJung
Copy link
Collaborator

Hi @mole99,

no need to be sorry, we are currently all just a bit busy :) I know that @EverythingElseWasAlreadyTaken was looking into the GL simulation, but I don't know what he found out. He is also currently not available but he will be available in the course of this week I think. Sorry that currently we/I can't help any further at the moment!

@mole99
Copy link
Contributor Author

mole99 commented Dec 10, 2024

Thanks for letting me know :)

I look forward to any news on this issue!

@EverythingElseWasAlreadyTaken
Copy link
Collaborator

Hi @mole99,
Sorry for the late reply, I spent some time trying to get the GL simulation working, but I couldn't get it right and got distracted by other things. The main problem for the GL-sim seems to be the X-propagation, more specific the standard cell implementation handles X values a bit different from our behavioral implementation. I've tested it quite extensively on different levels of abstractions and tried to understand where our behavioral rtl and GL-netlists differ.

I can say that our synthesized GL-netlist and behavioral rtl should behave the same for defined values, but if there are some X values involved, the behavior differs quite a lot, especially in multiplexer implementations.

Sadly, Icarusverilog has no real support for handling X-propagation, so I tried CVC, which sadly seems to be kinda buggy for our use cases. Most simulations end with a segmentation fault and whenever I try to use any X-propagation feature or commands that initializes my X values with 0/1 or just run the simulation as 2-state, it just freezes while compilation or execution. I would guess that FPGA designs with this massive amount of logic loops could be too complex to handle for CVC and are more a corner case for the most simulators. Maybe commercial ones could handle this better, or it can be at least constrained somehow.

If you have any ideas, we could maybe set up a call and try to brainstorm a bit.

@mole99
Copy link
Contributor Author

mole99 commented Jan 14, 2025

Hi @EverythingElseWasAlreadyTaken, thanks for your reply and for taking a look at the issue!

I thought I was going crazy ^^ But your observation of a different behavior in the X-propagation of stdcells compared to behavioral makes a lot of sense!

I'll try to read up on the subject and maybe we can have a short call next week?
You can reach me at: [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants