diff --git a/docs/source/Building fabric.rst b/docs/source/Building fabric.rst index d5a3f800..649aa774 100644 --- a/docs/source/Building fabric.rst +++ b/docs/source/Building fabric.rst @@ -78,7 +78,7 @@ The above command will generate the configuration storage for the ``LUT4AB`` til The above command will generate the actual tiles for the ``LUT4AB`` tile and the ``RAM_IO`` tile. -All the files generated will be located in the respective tile directory. i.e RTL for ``LUT4AB`` will be in ``Tile/LUT4AB/`` +All the files generated will be located in the respective tile directory. i.e. RTL for ``LUT4AB`` will be in ``Tile/LUT4AB/`` We will need to run the above commands for all the tiles to get all the RTL of all the tiles, which is quite tedious to do. As a result, the following command will generate all the RTL for all the tiles in the fabric including all the super @@ -114,7 +114,7 @@ tiles within the fabric. gen_model_vpr -#. Generate the meta data list for FASM --> Bitstream +#. Generate the metadata list for FASM → Bitstream .. prompt:: bash FABulous> diff --git a/docs/source/FPGA-to-bitstream/Bitstream generation.rst b/docs/source/FPGA-to-bitstream/Bitstream generation.rst index b0244423..c28c504b 100644 --- a/docs/source/FPGA-to-bitstream/Bitstream generation.rst +++ b/docs/source/FPGA-to-bitstream/Bitstream generation.rst @@ -23,7 +23,7 @@ the user had also run synthesis and place and route for the design, which genera To generate the bitstream, the user can call the ``gen_bitstream_binary `` command from the CLI, where the ``design.fasm`` is the ``.fasm`` file generated by synthesis and place and route. -The resulting bitstream is placed in the same directory as where the ``fasm`` file is located and named as +The resulting bitstream is placed in the same directory as where the ``fasm`` file is located and named as ``design.bin``. Manually generate bitstream diff --git a/docs/source/FPGA-to-bitstream/Nextpnr compilation.rst b/docs/source/FPGA-to-bitstream/Nextpnr compilation.rst index ca55066d..62caacbf 100644 --- a/docs/source/FPGA-to-bitstream/Nextpnr compilation.rst +++ b/docs/source/FPGA-to-bitstream/Nextpnr compilation.rst @@ -1,7 +1,8 @@ Nextpnr compilation =================== -Compile JSON to FASM by nextpnr <-- bels.txt + pips.txt +Nextpnr can compile a JSON description of a circuit to FASM [#]_ using the +architectural description in bels.txt and pips.txt Our nextpnr implementation uses nextpnr-generic for place and route. @@ -144,7 +145,8 @@ The following example is a 16-bit counter output to Block_RAM, and then Block_RA endmodule +Footnotes +--------- - - - +.. [#] The FPGA Assembly format, describing a concrete list of features on a + specific FPGA fabric to be enabled diff --git a/docs/source/FPGA-to-bitstream/VPR compilation.rst b/docs/source/FPGA-to-bitstream/VPR compilation.rst index 9c125596..07918e84 100644 --- a/docs/source/FPGA-to-bitstream/VPR compilation.rst +++ b/docs/source/FPGA-to-bitstream/VPR compilation.rst @@ -1,10 +1,11 @@ VPR compilation =============== -Compile BLIF to FASM by VPR <-- architecture.xml + routing_resources.xml - - -VPR (Versatile Place and Route) is a place and route tool from the VTR project that can be used to program a fabric generated by FABulous, using either Yosys or ODIN II for the logic synthesis. The VTR genfasm tool can then be used to generate an FPGA Assembly (FASM) file from which the bitstream can be generated. +VPR (Versatile Place and Route) is a place and route tool from the VTR project +that can be used to program a fabric generated by FABulous, using either Yosys +or ODIN II for the logic synthesis. The VTR genfasm tool can then be used to +generate an FPGA Assembly (FASM) file from a Berkeley Logic Interchange Format +(BLIF) file. The bitstream can then be generated from the FASM file. To generate the necessary materials to program using VPR, run ``$FAB_ROOT/fabric_generator/fabric_gen.py`` with the -genVPRModel flag. In the ``$FAB_ROOT/fabric_generator/vproutput`` directory, two files will be created - ``architecture.xml`` and ``routing_resources.xml``. @@ -18,7 +19,7 @@ To use Yosys (recommended with FABulous for improved functionality), follow the When generating the VPR model, FABulous will print out a maximum width for routing channels in the form ``Max Width: ``. This number should be noted, as it will be used as an argument when calling VPR. -To run the VPR flow (with VPR 8.1.0 installed) , the following command can be used: +To run the VPR flow (with VPR 8.1.0 installed), the following command can be used: .. code-block:: console diff --git a/docs/source/FPGA_CAD-tools/index.rst b/docs/source/FPGA_CAD-tools/index.rst index 59d700d9..93865c40 100644 --- a/docs/source/FPGA_CAD-tools/index.rst +++ b/docs/source/FPGA_CAD-tools/index.rst @@ -1,3 +1,5 @@ +.. _fpga_cad_tool_parametrization: + FPGA CAD-tool parameterization ============================== diff --git a/docs/source/FPGA_CAD-tools/vpr.rst b/docs/source/FPGA_CAD-tools/vpr.rst index 32e5f8b4..b07b0e6d 100644 --- a/docs/source/FPGA_CAD-tools/vpr.rst +++ b/docs/source/FPGA_CAD-tools/vpr.rst @@ -1,7 +1,7 @@ VPR models ========== -To generate the necessary materials to program using VPR, run ``$FAB_ROOT/fabric_generator/fabric_gen.py`` with the -genVPRModel flag followed by the location of your custom information XML file (an description an example of which can be found below). In the ``$FAB_ROOT/fabric_generator/vproutput`` directory, two files will be created: ``architecture.xml`` and ``routing_resources.xml``. +To generate the necessary materials to program using VPR, run ``$FAB_ROOT/fabric_generator/fabric_gen.py`` with the -genVPRModel flag followed by the location of your custom information XML file (a description an example of which can be found below). In the ``$FAB_ROOT/fabric_generator/vproutput`` directory, two files will be created: ``architecture.xml`` and ``routing_resources.xml``. architecture.xml contains a description of the various tiles, ports and BELs - everything in the architecture except for the routing resources. @@ -16,7 +16,7 @@ The custom XML file should open and close with ```` and `` content <\bel_pb>** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This tag should contain the exact XML that should be inserted to define the second-level ``pb_type`` that represents this bel, including the ```` tag itself. This should represent only one instance of the BEL (i.e. ``num_pb`` should be 1) as different instances are now represented by FABulous as individual subtiles, each of which has the ``pb_type`` as its equivalent site. Your XML will be automatically inserted inside a top-level wrapper ``pb_type``, and all inputs/outputs will be routed through into your description - therefore, it is required that your custom ``pb_type`` has at least the inputs and outputs described in your HDL model. +This tag should contain the exact XML that should be inserted to define the second-level ``pb_type`` that represents this BEL, including the ```` tag itself. This should represent only one instance of the BEL (i.e. ``num_pb`` should be 1) as different instances are now represented by FABulous as individual subtiles, each of which has the ``pb_type`` as its equivalent site. Your XML will be automatically inserted inside a top-level wrapper ``pb_type``, and all inputs/outputs will be routed through into your description - therefore, it is required that your custom ``pb_type`` has at least the inputs and outputs described in your HDL model. ** content <\bel_model>** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -153,7 +153,7 @@ Notes for developers The ptc number provided for each node in the routing resource (RR) graph represents the pin, track or class of the node. With SOURCE, SINK, IPIN and OPIN nodes, this is the ptc of the appropriate pin in the block type definition, however with CHANY and CHANX nodes it is more arbitrary. Here, each wire's ptc number should be different from any wire it overlaps with **anywhere along its length**. Previously, every wire had a separate PTC number, but this was recently updated so that no horizontal wire has the same number as any vertical wire, no two horizontal wires in the same row share a number, and no two vertical wires in the same column share a number. More information on the meaning of the PTC number can be found in `this Google Group discussion `_. -Although no meaningful routing connections are specified in the architecture.xml file, it is important that all pins do not have an Fc value of 0. This is because VPR uses the Fc value to gauge how well connected to the fabric a pin is, and so will not be able to find any routing candidates with 0 Fc pins. Currently FABulous is set up with a default fractional Fc of 1 such that all pins are connected to the fabric and are viable candidates. +Although no meaningful routing connections are specified in the architecture.xml file, it is important that all pins do not have an Fc value of 0. This is because VPR uses the Fc value to gauge how well-connected to the fabric a pin is, and so will not be able to find any routing candidates with 0 Fc pins. Currently, FABulous is set up with a default fractional Fc of 1 such that all pins are connected to the fabric and are viable candidates. Due to the techmapping complexity, the multiplexers in the LUT4AB tiles are currently ignored and it is assumed each LUT is routed to a separate output - at the time of writing, the same assumption is made for the nextpnr model. diff --git a/docs/source/Usage.rst b/docs/source/Usage.rst index 47d125df..1d20dcaf 100644 --- a/docs/source/Usage.rst +++ b/docs/source/Usage.rst @@ -24,13 +24,13 @@ The following packages need to be installed for generating fabric HDLs pip3 install -r requirements.txt -This will also require to install `Tkinter` for the TCL facilities. To install `Tkinter` on Ubuntu, run: +This will also require installing `Tkinter` for the TCL facilities. To install `Tkinter` on Ubuntu, run: .. code-block:: console sudo apt-get install python3-tk -The following packages need to be installed for the CAD toolchain +The following packages need to be installed for the CAD toolchain: :`Yosys `_: version > 0.26+0 diff --git a/docs/source/conf.py b/docs/source/conf.py index 4f4cd127..58bd8ea7 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -1,36 +1,38 @@ import os import sys + # Configuration file for the Sphinx documentation builder. # -- Project information -project = 'FABulous Documentation' -copyright = '2021, University of Manchester' -author = 'Jing, Nguyen, Bea, Bardia, Dirk' +project = "FABulous Documentation" +copyright = "2021, University of Manchester" +author = "Jing, Nguyen, Bea, Bardia, Dirk" -release = '0.1' -version = '0.1.0' +release = "0.1" +version = "0.1.0" # -- General configuration extensions = [ - 'sphinx.ext.duration', - 'sphinx.ext.doctest', - 'sphinx.ext.autodoc', - 'sphinx.ext.autosummary', - 'sphinx.ext.intersphinx', - 'sphinxcontrib.bibtex', - 'sphinx.ext.napoleon', - 'sphinx-prompt' + "sphinx.ext.duration", + "sphinx.ext.doctest", + "sphinx.ext.autodoc", + "sphinx.ext.autosummary", + "sphinx.ext.intersphinx", + "sphinxcontrib.bibtex", + "sphinx.ext.napoleon", + "sphinx-prompt", + "sphinx.ext.todo", ] intersphinx_mapping = { - 'python': ('https://docs.python.org/3/', None), - 'sphinx': ('https://www.sphinx-doc.org/en/master/', None), + "python": ("https://docs.python.org/3/", None), + "sphinx": ("https://www.sphinx-doc.org/en/master/", None), } -intersphinx_disabled_domains = ['std'] +intersphinx_disabled_domains = ["std"] -templates_path = ['_templates'] +templates_path = ["_templates"] sys.path.append(os.getcwd() + "/../../") @@ -52,12 +54,12 @@ # -- Options for HTML output -html_theme = 'sphinx_materialdesign_theme' +html_theme = "sphinx_materialdesign_theme" -html_logo = 'figs/FAB_logo.png' +html_logo = "figs/FAB_logo.png" # -- Options for EPUB output -epub_show_urls = 'footnote' +epub_show_urls = "footnote" -bibtex_bibfiles = ['publications.bib'] +bibtex_bibfiles = ["publications.bib"] diff --git a/docs/source/fabric_definition.rst b/docs/source/fabric_definition.rst index 7aaa2c22..32b149f1 100644 --- a/docs/source/fabric_definition.rst +++ b/docs/source/fabric_definition.rst @@ -21,7 +21,7 @@ The following figure shows a small fabric, which we will model throughout this s The full model of a fabric is described by the following files: * A file :ref:`fabric_csv` providing the :ref:`fabric_layout`, some global settings, and the descriptions of the :ref:`tiles`- -* A set of list files (\*.list) desribing the adjacency list of the switch matrix for each of the used tiles or the corresponding adjacency matrix as a CSV file +* A set of list files (\*.list) describing the adjacency list of the switch matrix for each of the used tiles or the corresponding adjacency matrix as a CSV file * A set of optional bitstream mapping CSV files * A set of primitives used @@ -67,7 +67,7 @@ Fabric CSV description * Empty lines will be ignored as well as everything that follows a ``#`` \(the **comment** symbol in all FABulous descriptions\). -* Parameters that relate to the fabric specification are encapsulated between the key words ``ParametersBegin`` and ``ParametersEnd``. +* Parameters that relate to the fabric specification are encapsulated between the keywords ``ParametersBegin`` and ``ParametersEnd``. Parameters that relate to the flow are passed as command line arguments. @@ -77,7 +77,7 @@ Fabric CSV description * ``ConfigBitMode``, ``[frame_based|FlipFlopChain]`` - FABulous can write to the configuration bits in a frame-based organisation, similarly to most commercial FPGAs. This supports partial reconfiguration and is (except for in tiny fabrics) superior in any sense (configuration speed, resource cost, power consumption) over flip flop scan chain configuration (the option selected by most other open source FPGA frameworks). + FABulous can write to the configuration bits in a frame-based organisation, similarly to most commercial FPGAs. This supports partial reconfiguration and is (except for in tiny fabrics) superior in any sense (configuration speed, resource cost, power consumption) over flip-flop scan chain configuration (the option selected by most other open source FPGA frameworks). Configuration readback is not currently supported, as it was considered ineffective for embedded FPGA use cases. @@ -85,9 +85,9 @@ Fabric CSV description In frame-based configuration mode, FABulous will build a configuration frame register over the height of the fabric and provide the specified number of data bits per row. This will generate frame_data wires in the fabric, which correspond to bitlines in a memory organisation. - Note that the specified size corresponds to the width of the parallel configuraton port and 32 bits is the most sensible configuration for most systems. + Note that the specified size corresponds to the width of the parallel configuration port and 32 bits is the most sensible configuration for most systems. - Currently, we set ``FrameBitsPerRow`` globally for all rows but we plan to extend this to allow for resource-type specific adjustments in future versions. + Currently, we set ``FrameBitsPerRow`` globally for all rows, but we plan to extend this to allow for resource-type specific adjustments in future versions. For instance, the tiles at the north border of a fabric may only provide some fixed U-turn routing without the need of any configuration bits, which could be reflected by removing all frame_data wires in the top row. This extension may include an automatic adjustment mode. * ``MaxFramesPerCol``, ``unsigned_int`` @@ -100,10 +100,10 @@ Fabric CSV description FABulous will generate the specified number of vertical frame_strobe wires in the fabric, which correspond to wordlines in memory organisation. - ``FrameBitsPerRow`` and ``MaxFramesPerCol`` should be around the same number to minimize the wiring resources for driving the configuration bits into the fabric. In most cases, only ``MaxFramesPerCol`` will be adjusted to a number that can accomodate the number of configuration bits needed. + ``FrameBitsPerRow`` and ``MaxFramesPerCol`` should be around the same number to minimize the wiring resources for driving the configuration bits into the fabric. In most cases, only ``MaxFramesPerCol`` will be adjusted to a number that can accommodate the number of configuration bits needed. Currently, we set ``MaxFramesPerCol`` globally for all resource types (e.g., LUTs and DSP block columns) but we plan to extend this to allow for resource-type specific adjustments. - This feature may include an automatic adjustment mode. + This feature may include an automatic adjustment mode. * ``Package``, ``string`` @@ -113,7 +113,9 @@ Fabric CSV description This option will annotate the specified time in ps to all switch matrix multiplexers. This ignored for synthesis but allows simulation of the fabric in the case of configured loops (e.g., ring-oscillators). - * ``MultiplexerStyle``, ``[custom|TODO]`` + * ``MultiplexerStyle``, ``[custom]`` + + .. todo:: Add missing multiplexer styles. FABulous can generate the switch matrix multiplexers in different styles including behavioral RTL, instantiating standard cell primitives and instantiation of full custom multiplexers. @@ -149,7 +151,7 @@ The following figure shows the fabric.csv representation of our example fabric a NULL, S_term, S_term, S_term, S_term, NULL FabricEnd -* The fabric layout is encapsulated between the key words ``FabricBegin`` and ``FabricEnd``. +* The fabric layout is encapsulated between the keywords ``FabricBegin`` and ``FabricEnd``. The specified tiles are references to tile descriptors (see :ref:`tiles`). The tiles form a coordinate system with the origin in the top-left: @@ -185,7 +187,7 @@ A tile is the smallest unit in a fabric and a tile provides * A central configuration storage module A tile typically hosts primitives like a CLB with LUTs or an I/O block. -Multiple smaller tiles can be combined into :ref:`supertiles` to accomodate complex blocks like DSPs. +Multiple smaller tiles can be combined into :ref:`supertiles` to accommodate complex blocks like DSPs. Each tile that is referred to in the :ref:`fabric_layout` requires specification of the corresponding tile description in the fabric.csv file that has the following format: .. code-block:: python @@ -236,7 +238,7 @@ specifying: For instance, a single wire in NORTH direction should use names such as *N1Beg* to *N1End* or *N1b* to *N1e*. The destination name refers to two ports: a port on the target tile and an expected port on the destination tile. This reflects that wires route between tiles and that the begin and end ports of a tile connect to different wires. - However, while this works for tiles inside the fabric (like CLBs), the tiles at the border do usually not extend to antennae outside the fabric but instead route wires back into the fabric as shown in the following figure: + However, while this works for tiles inside the fabric (like CLBs), the tiles at the border usually do not extend to antennae outside the fabric, but instead route wires back into the fabric as shown in the following figure: .. figure:: figs/east_terminate.* :alt: Basic tile illustration @@ -278,7 +280,7 @@ specifying: In this example, the CPU interface is located at the west border of the fabric. The fabric provides three slots, each being two CLB columns wide. The operands are routed into the fabric using double wires (so, each slot receives the operands at exactly the same position, which makes modules relocatable among the slots). The results are routed to the CPU using nested hex wires (again resulting in a homogeneous routing scheme that enables module relocation). The CPU therefore has access to the results of each slot and will multiplex results into the register file in case a custom instruction requires it to do so. For simplicity, the figure does not show the west termination tiles, which simply connect the internal routing wires to the top-level fabric wrapper that, in turn, is used to connect to the CPU. In summary, the example shows how a termination tile can be used to provide more complex interface blocks and all this can be easily modelled and implemented with FABulous. - .. note:: The ``destination_name`` is refering to the port name used at the destination tile. FABulous will throw an error if the destination tile does not provide that port name. + .. note:: The ``destination_name`` is referring to the port name used at the destination tile. FABulous will throw an error if the destination tile does not provide that port name. Aside from ``BEGIN`` and ``END``, there also exist ``MID`` ports, which can be used for wires spanning more than two tiles. Although they route over two tiles, they also have a tap on the middle tile. @@ -310,7 +312,7 @@ specifying: FABulous will index the wires of each entry starting from [0]. -A metric that is important for FPGA ASIC implementations is the channel *cut* number, which denotes the number of wires that must be accomodated between two adjacent tiles. The cut number is an indicator for the congestion to be expected when stitching together the fabric. Let us take the following example: +A metric that is important for FPGA ASIC implementations is the channel *cut* number, which denotes the number of wires that must be accommodated between two adjacent tiles. The cut number is an indicator for the congestion to be expected when stitching together the fabric. Let us take the following example: .. code-block:: python :emphasize-lines: 1 @@ -392,7 +394,7 @@ is equivalent to S2BEG2,S2END2 # extend double wires in each direction W2BEG2,W2END2 # extend double wires in each direction -The example shows how port names can be composed from string segments that can alternatively be provided in list form. The lists will be recursively unwrapped, which allows it to use multiple list operators together. +The example shows how port names can be composed of string segments that can alternatively be provided in list form. The lists will be recursively unwrapped, which allows it to use multiple list operators together. An error message is generated if the number of composed port names differs for the number of input_ports and output_ports or if ports are not found. A warning will be generated if FABulous tries to set a connection that has already been specified. @@ -437,7 +439,7 @@ For the rows, this denotes the size of the multiplexers (e.g., MUX4) and by chec .. note:: The multiplexers in the switch matrices are controlled by configuration bits only. - The multiplexers in :ref:`primitives` can either be controlled by configuration bits (e.g., to select if a LUT output is to be routed to a primitive output pin or through a flop) or by the user logic (e.g., to cascade adjacent LUTs for implementing larger LUTs (like the F7MUX and F8MUX multiplexers in Xilinx FPGAs with LUT6). + The multiplexers in :ref:`primitives` can either be controlled by configuration bits (e.g., to select if a LUT output is to be routed to a primitive output pin or through a flop) or by the user logic (e.g., to cascade adjacent LUTs for implementing larger LUTs, like the F7MUX and F8MUX multiplexers in Xilinx FPGAs with LUT6s). .. note:: Defining the adjacency of a switch matrix (and the wires) is a difficult task. Too many connections and wires are expensive to implement and will result in poor density and potentially in poor performance. However, too few connections and wires may lead to an inability to implement the intended user circuits on the fabric in the first place. The latter issue is not easily solvable by leaving primitives unused because that requires, for example, the use of more CLBs. That, in turn, requires more wires between the tiles, and will therefore jeopardize the approach of underutilising the CLBs. @@ -452,7 +454,7 @@ For the rows, this denotes the size of the multiplexers (e.g., MUX4) and by chec Primitives ~~~~~~~~~~ -Primitives are used to manipulate, store and input/output data. Examples for primitives include LUTs, slices (a cluster of LUTs that share a clock and that can be cascaded for arithmetic), flip flops, individual gates or multiplexers, and complex blocks like DSPs, ALUs or BRAMs. A tile may have no primitives (e.g., the north and south terminate tiles in our example fabric) or as many as needed. +Primitives are used to manipulate, store and input/output data. Examples for primitives include LUTs, slices (a cluster of LUTs that share a clock and that can be cascaded for arithmetic), flip-flops, individual gates or multiplexers, and complex blocks like DSPs, ALUs or BRAMs. A tile may have no primitives (e.g., the north and south terminate tiles in our example fabric) or as many as needed. Primitives are added with ``BEL`` statements (BEL stands for Basic Element of Logic and the phrase is adopted from Xilinx), as shown in the following tile definition fragment: @@ -470,7 +472,16 @@ Primitives are added with ``BEL`` statements (BEL stands for Basic Element of Lo MATRIX, LUT4AB_switch_matrix.vhdl EndTILE -FABulous simply adds primitives as RTL code blocks. This is a different philosophy than the usual VPR approach where primitives are generated by models. While the VPR path has advantages to drive automated design space exploration, the FABulous way is more convenient when modeling an existing fabric. However, this requires some adaptations to the FPGA CAD tools as described in TODO. Complex blocks are usually not inferred by VHDL or Verilog constructs but through direct primitive instantiations, which is common for all commercial FPGAs. Nevertheless, Yosys can implement arrays specified in RTL automatically to BRAMs and the Verilog multiply operator directly to our DSP blocks. +FABulous simply adds primitives as RTL code blocks. This is a different +philosophy than the usual VPR approach where primitives are generated by +models. While the VPR path has advantages to drive automated design space +exploration, the FABulous way is more convenient when modeling an existing +fabric. However, this requires some adaptations to the FPGA CAD tools as +described in :ref:`fpga_cad_tool_parametrization`. +Complex blocks are usually not inferred by VHDL or Verilog constructs but +through direct primitive instantiations, which is common for all commercial +FPGAs. Nevertheless, Yosys can implement arrays specified in RTL automatically +to BRAMs and the Verilog multiply operator directly to our DSP blocks. The BEL statements in the previous example instantiate a LUT4 in VHDL: @@ -525,7 +536,12 @@ FABulous defines the following coding rules for BELs: * We use directives (provided as comments) to control the code generation semantics. The supported directives include: - * ``EXTERNAL``: ports flagged with this directive are not connected to the switch matrix but are exported through the tile entity to the top-level fabric wrapper. The corresponding port will be exported with a tile prefix and, if provided in the BEL statement, the instance prefix. The following two blocks provide an OutBlock tile with two BEL statements and the corresponding Out_Pad module: + * ``EXTERNAL``: Ports flagged with this directive are not connected to the + switch matrix but are exported through the tile entity to the top-level + fabric wrapper. The corresponding port will be exported with a tile prefix + and, if provided in the BEL statement, the instance prefix. The following + two blocks provide an OutBlock tile with two BEL statements and the + corresponding Out_Pad module: .. code-block:: python :emphasize-lines: 1,4,5 @@ -541,19 +557,21 @@ FABulous defines the following coding rules for BELs: entity Out_Pad is Generic ( NoConfigBits : integer := 0 ); -- has to be adjusted manually Port ( -- IMPORTANT: this has to be in a dedicated line - I : in STD_LOGIC; -- LUT inputs + I : in STD_LOGIC; -- input from fabric O_pin : out STD_LOGIC; -- EXTERNAL + UserCLK : in STD_LOGIC --EXTERNAL SHARED_PORT ); end entity Out_Pad; - If the provided RTL code are Verilog + If the provided RTL code is Verilog .. code-block:: verilog module Out_Pad (I0, O_pin); parameter NoConfigBits = 19 ; // has to be adjusted manually - input I; // LUT inputs + input I; // input from fabric (* FABulous, EXTERNAL *) output O_pin; + (* FABulous, EXTERNAL, SHARED_PORT *) input UserCLK; ... endmodule @@ -577,7 +595,7 @@ FABulous defines the following coding rules for BELs: Tile_X0Y2_B_O_pin : out STD_LOGIC; -- EXTERNAL ... ); - end entity Out_Pad; + end entity eFPGA; If generating for Verilog output @@ -598,9 +616,11 @@ FABulous defines the following coding rules for BELs: ... endmodule - TODO + .. todo:: Unknown TODO. Probably for GLOBAL? - * ``SHARED_PORT``: this directive can only be used together optionally with ``EXTERNAL``. If a port is set ``EXTERNAL`` but not ``SHARED_PORT``, then , a TODO ( shared ports flagged with this directive are not connected to the switch matrix but are exported through the tile entity to the top-level fabric wrapper. + * ``SHARED_PORT``: This directive can only be used together optionally with + ``EXTERNAL``. It is used to allow multiple BELs to use the same port, e.g. + for exporting a clock signal to the top. .. _bitstream: @@ -608,7 +628,7 @@ Bitstream remapping ~~~~~~~~~~~~~~~~~~~ FABulous will take care when implementing the configuration logic and bitstream encoding and the mapping of this into configuration bitstreams. This can be done automatically. -However, users can influence the mapping of configuration bits into the bitstream. For our first chip, we used remapping to create a human readable bitstream which is more convenient to modify in a hex editor, as described later in this subsection. +However, users can influence the mapping of configuration bits into the bitstream. For our first chip, we used remapping to create a human-readable bitstream which is more convenient to modify in a hex editor, as described later in this subsection. In the code example for a LUT, it was shown that the configuration bits are exported into the LUT interface: @@ -620,6 +640,9 @@ In the code example for a LUT, it was shown that the configuration bits are expo Port ( -- IMPORTANT: this has to be in a dedicated line ... ConfigBits : in STD_LOGIC_VECTOR( NoConfigBits -1 downto 0 ) -- These are the configuration bits + ... + ); + end entity LUT4 Exporting configuration bits is a requirement for any primitive or switch matrix that uses configuration bits. The tile configuration bitstream is formed by concatenating first the primitive configuration bits (if primitives are available and use configuration bits) and then the switch matrix configuration bits (again, only if the switch matrix uses configuration bits) into one long tile configuration word. This is done in the order that the primitives are declared by ``BEL`` entries in the tile definition. Configuration bitstream vectors are defined in the *downto* direction and the first BEL primitive configuration bits will be placed at the LSB side of the tile bitstream and the configuration switch matrix at the MSB side. @@ -671,7 +694,7 @@ The following example is the FABulous-generated mapping file of the CLB implemen frame18, 18, 0, 0000_0000_0000_0000_0000_0000_0000_0000, frame19, 19, 0, 0000_0000_0000_0000_0000_0000_0000_0000, -FABulous are will generate a default _ConfigMem.csv, and users are not required to modify the _ConfigMem.csv file. However, if FABulous finds a file called _ConfigMem.csv before generating it, it will use the bitstream mapping provided instead. The following example shows the basic idea that was used to provide a human-readable bitstream encoding. It is not intended to understand the example in detail. The basic idea is to align configuration LUT function tables, settings and the switch matrix multiplexer encoding to be nibble aligned such that they are easy to find in a hex editor. For instance, in the example below, the first 8 frames are mostly encoding the LUTs where the 16 MSBs are the LUT tables and the next two nibbles are encoding a flop and carry-chain mode: +FABulous will generate a default _ConfigMem.csv, and users are not required to modify the _ConfigMem.csv file. However, if FABulous finds a file called _ConfigMem.csv before generating it, it will use the bitstream mapping provided instead. The following example shows the basic idea that was used to provide a human-readable bitstream encoding. It is not intended to understand the example in detail. The basic idea is to align configuration LUT function tables, settings and the switch matrix multiplexer encoding to be nibble aligned such that they are easy to find in a hex editor. For instance, in the example below, the first 8 frames are mostly encoding the LUTs where the 16 MSBs are the LUT tables and the next two nibbles are encoding a flop and carry-chain mode: .. code-block:: python @@ -690,7 +713,10 @@ FABulous are will generate a default _ConfigMem.csv, and users frame9,9,32,1111_1111_1111_1111_1111_1111_1111_1111,397:394,401:398,405:402,409:406,413:410,417:414,421:418,425:422 ... -The more important use case of bitstream remapping is to optimize the physical implementation of the configuration tiles. FABulous includes a corresponding optimizer that generates the bitstream remapping files automatically. The process is described in detail in TODO FPGA_2022_paper. +The more important use case of bitstream remapping is to optimize the physical +implementation of the configuration tiles. FABulous includes a corresponding +optimizer that generates the bitstream remapping files automatically. The +process is described in detail in Chung et al :cite:`10.1145/3490422.3502371`. .. _supertiles: @@ -700,7 +726,7 @@ Supertiles Supertiles are grouping together multiple basic :ref:`tiles`. Basic tiles are the smallest tile exposed to users providing a switch matrix, wires to the surrounding, and usually one or more primitives (like in a CLB tile). Supertiles are needed for blocks that require more logic and/or more wires to the routing fabric (e.g., as needed for DSP blocks). Therefore, supertiles will normally provide as many switch matrices as they integrate basic tiles. -However, larger supertiles (e.g., hosting a CPU or similar) may only provide switch matrices in basic tiles located at the border of such a supertile +However, larger supertiles (e.g., hosting a CPU or similar) may only provide switch matrices in basic tiles located at the border of such a supertile. In any case: supertiles must provide wire interfaces that match the surroundings when stitching them into a fabric. Modelling @@ -715,7 +741,7 @@ Supertiles are modelled from elementary tiles in a spreadsheet/csv file similar myZ_00, NULL myZ_01, myZ_11 NULL, myZ_12 - EndSuperTILE # this is case insensitive + EndSuperTILE # this is case-insensitive SuperTILE, my_I # define supertile name my_top @@ -736,13 +762,13 @@ Supertiles are modelled from elementary tiles in a spreadsheet/csv file similar Supertiles will be instantiated in the fabric (VHDL or Verilog) file, and supertiles themselves instantiate basic tiles (e.g., the ones shown in the figure). Therefore, supertiles define wires and switch matrices through their instantiated basic tiles. -Supertiles have an **anchor tile**, which is used to specify their position in the fabric. The anchor tile is determined by a row-by-row scan over the basic tiles and it will be the first non-NULL tile found. All other basic tiles will be placed relatively to the anchor tile. The anchor tiles in the figure above have been marked using a bold font. So far, anchor tiles are only used internally in FABulous but it is planned to allow placing supertiles through their anchor tiles in the fabric layout, rather than through their basic tiles. +Supertiles have an **anchor tile**, which is used to specify their position in the fabric. The anchor tile is determined by a row-by-row scan over the basic tiles, and it will be the first non-NULL tile found. All other basic tiles will be placed relatively to the anchor tile. The anchor tiles in the figure above have been marked using a bold font. So far, anchor tiles are only used internally in FABulous, but it is planned to allow placing supertiles through their anchor tiles in the fabric layout, rather than through their basic tiles. If a basic tile has a **border to the outside world** (i.e. the surrounding fabric), the interface to that border is exported to the supertile interface (i.e. the Entity in VHDL). Those borders are marked blue in the figure above. Internal edges are connected inside the supertile wrapper according to the entire tile specification. A basic tile instantiated in a supertile may not implement interfaces to all NORTH, EAST, SOUTH, WEST directions. For instance, a supertile may include basic terminate tiles if the supertile is supposed to be placed at the border of the fabric. -Tile ports that are declared ``EXTERNAL`` in the basic tiles will be exported all the way to the top-level, in the same wayas is done for :ref:`tiles` +Tile ports that are declared ``EXTERNAL`` in the basic tiles will be exported all the way to the top-level, in the same way it is done for :ref:`tiles` .. code-block:: VHDL :emphasize-lines: 1 diff --git a/docs/source/publications.bib b/docs/source/publications.bib index fc994c1e..0a0a881c 100644 --- a/docs/source/publications.bib +++ b/docs/source/publications.bib @@ -26,3 +26,21 @@ @inproceedings{10.1145/3431920.3439302 location = {Virtual Event, USA}, series = {FPGA '21} } + +@inproceedings{10.1145/3490422.3502371, +author = {Chung, King Lok and Dao, Nguyen and Yu, Jing and Koch, Dirk}, +title = {How to Shrink My FPGAs — Optimizing Tile Interfaces and the Configuration Logic in FABulous FPGA Fabrics}, +year = {2022}, +isbn = {9781450391498}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +url = {https://doi.org/10.1145/3490422.3502371}, +doi = {10.1145/3490422.3502371}, +abstract = {Commercial FPGAs from major vendors are extensively optimized, and fabrics use many hand-crafted custom cells, including switch matrix multiplexers and configuration memory cells. The physical design optimizations commonly improve area, latency (=speed), and power consumption together. This paper is dedicated to improving the physical implementation of FPGA tiles and the configuration storage in SRAM FPGAs. This paper proposes to remap configuration bits and interface wires to implement tightly packed tiles. Using the FABulous FPGA framework, we show that our optimizations are virtually for free but can save over 20\% in area and improve latency at the same time. We will evaluate our approach in different scenarios by changing the available metal layers or the requested channel capacity. Our optimizations consider all tiles and we propose a flow that resolves dependencies between the CLBs and other tiles. Moreover, we will show that frame-based reconfiguration is, in almost all cases, better than shift register configuration.}, +booktitle = {Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays}, +pages = {13–23}, +numpages = {11}, +keywords = {optimization, open source, open hardware, fpgas}, +location = {Virtual Event, USA}, +series = {FPGA '22} +} diff --git a/docs/source/simulation/emulation.rst b/docs/source/simulation/emulation.rst index f0232d0c..872d4138 100644 --- a/docs/source/simulation/emulation.rst +++ b/docs/source/simulation/emulation.rst @@ -1,7 +1,7 @@ Emulation setup =============== -(Emulation function is under built) +.. note:: The emulation functionality is implemented but needs more testing. The script ``bit_gen.py`` in :ref:`bitstream generation` diff --git a/docs/source/simulation/simulation.rst b/docs/source/simulation/simulation.rst index 5e2ec38a..82d886c3 100644 --- a/docs/source/simulation/simulation.rst +++ b/docs/source/simulation/simulation.rst @@ -30,9 +30,9 @@ FABulous comes with 3 different simulation methods _`configuration module`, We drive s_clk and s_data. On each rising edge of s_clock, we sample data and on the falling edge, we sample control. - Both values get shifted in a separate register. If the control register sees the bit-pattern x”FAB0” it samples the data shift register into a hold register and issues a one-cycle strobe output (active 1). + Both values get shifted into a separate register. If the control register sees the bit-pattern x”FAB1” it samples the data shift register into a hold register and issues a one-cycle strobe output (active 1). - The next figure shows the enable generation (and input sampling) for generating the enable signals for + The next figure shows the generation (and input sampling) of the enable signals for * the control shift register and * the data shift register. @@ -40,6 +40,3 @@ FABulous comes with 3 different simulation methods _`configuration module`, .. figure:: ../figs/bitbang2.* :alt: Bitbang schematic :align: center - - -