Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[major] Fine-grind parameters and capabilities for memories #96

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions revision-history.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ revisionHistory:
# populated using the "version" that the Makefile grabs from git. Notable
# additions to the specification should append entries here.
thisVersion:
- Added fine-grind parameters and capabilities for memories.

# Information about the old versions. This should be static.
oldVersions:
Expand Down
190 changes: 120 additions & 70 deletions spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -1671,40 +1671,52 @@ by the following parameters.
2. A positive integer literal representing the number of elements in the
memory.

3. A variable number of named ports, each being a read port, a write port, or
readwrite port.

4. A non-negative integer literal indicating the read latency, which is the
number of cycles after setting the port's read address before the
corresponding element's value can be read from the port's data field.

5. A positive integer literal indicating the write latency, which is the number
of cycles after setting the port's write address and data before the
corresponding element within the memory holds the new value.

6. A read-under-write flag indicating the behavior when a memory location is
3. A variable number of named ports, each having following parameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"each having the following parameters in this order:"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Author

@CircuitCoder CircuitCoder Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e6311a7.

1. A read flag indicating the read capability of this port.
2. A write flag indicating the write capability of this port.
3. If the port can read, a non-negative integer literal indicating the read
latency, which is the number of cycles after setting the port's read
address before the corresponding element's value can be read from the
port's rdata field.
4. If the port can write, a positive integer literal indicating the write
latency, which is the number of cycles after setting the port's write
address and wdata before the corresponding element within the memory
holds the new value.
4. A read-under-write flag indicating the behavior when a memory location is
written to while a read to that location is in progress.
5. An optional type representing the custom port of this memory. This custom
port is intended for post-synthesis flows, and should be ignored in
behavioral simulation.

Integer literals for the number of elements and the read/write latencies _may
not be string-encoded integer literals_.

The following example demonstrates instantiating a memory containing 256 complex
numbers, each with 16-bit signed integer fields for its real and imaginary
components. It has two read ports, `r1`{.firrtl} and `r2`{.firrtl}, and one
write port, `w`{.firrtl}. It is combinationally read (read latency is zero
cycles) and has a write latency of one cycle. Finally, its read-under-write
write port, `w`{.firrtl}, with the ability to do partial writes using a mask.
All of its read ports is combinationally read (read latency is zero cycles) and
its write port has a write latency of one cycle. Finally, its read-under-write
behavior is undefined.

``` firrtl
mem mymem :
data-type => {real:SInt<16>, imag:SInt<16>}
depth => 256
reader => r1
reader => r2
writer => w
read-latency => 0
write-latency => 1
read-under-write => undefined
port r1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to mix => syntax and : syntax.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation of using : is to distinguish a indented block from a single configuration value. Using a different delimitator might make writing parser a little bit easier?

read => yes
write => no
read-latency => 0
port r2:
read => yes
write => no
read-latency => 0
port w:
read => no
write => with-mask
write-latency => 1
custom-port => {a:UInt<4>, flip b:UInt<2>}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this match syntax more? Something like:
port custom: {a:UInt<4>, flip b:UInt<2>}

Copy link
Author

@CircuitCoder CircuitCoder Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Maybe we should use another keyward (e.g. custom-port) to avoid syntax ambiguity when parsing:

custom-port custom_a => {a:Uint<4>, flip b:Uint<2>}

The motivation behind the custom-port => syntax is that there should only be at most one custom port for a single memory. But this may cause implicit naming collision with other ports. So let's also include the port name here.

Using => is to keep in line with other single-value configuration. See below

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in e6311a7 to use custom-port custom => {a:UInt<4>, flip b:UInt<2>}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should there be at most one custom port?

Copy link
Author

@CircuitCoder CircuitCoder May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the custom-port custom => ... syntax, this limitation is lifted, and there can be multiple custom ports now. However it might still be desirable.

If people requires multiple custom ports, they can always aggregate them into a single one. Limiting the number of custom ports may simplify compilers.

```

In the example above, the type of `mymem`{.firrtl} is:
Expand All @@ -1713,76 +1725,97 @@ In the example above, the type of `mymem`{.firrtl} is:
{flip r1: {addr: UInt<8>,
en: UInt<1>,
clk: Clock,
flip data: {real: SInt<16>, imag: SInt<16>}},
flip rdata: {real: SInt<16>, imag: SInt<16>}},
flip r2: {addr: UInt<8>,
en: UInt<1>,
clk: Clock,
flip data: {real: SInt<16>, imag: SInt<16>}},
flip rdata: {real: SInt<16>, imag: SInt<16>}},
flip w: {addr: UInt<8>,
en: UInt<1>,
clk: Clock,
data: {real: SInt<16>, imag: SInt<16>},
mask: {real: UInt<1>, imag: UInt<1>}}}
wdata: {real: SInt<16>, imag: SInt<16>},
mask: {real: UInt<1>, imag: UInt<1>}},
custom: {a:UInt<4>, flip b:UInt<2>}}
```

The following sections describe how a memory's field types are calculated and
the behavior of each type of memory port.

### Read Ports
### Ports

If a memory is declared with element type `T`{.firrtl}, has a size less than or
equal to $2^N$, then its read ports have type:
Ports can have one of the following read capabilities:

``` firrtl
{addr: UInt<N>, en: UInt<1>, clk: Clock, flip data: T}
```
- `no`{.firrtl}: This port cannot read from the memory
- `yes`{.firrtl}: This port can read from the memory

If the `en`{.firrtl} field is high, then the element value associated with the
address in the `addr`{.firrtl} field can be retrieved by reading from the
`data`{.firrtl} field after the appropriate read latency. If the `en`{.firrtl}
field is low, then the value in the `data`{.firrtl} field, after the appropriate
read latency, is undefined. The port is driven by the clock signal in the
`clk`{.firrtl} field.
Ports can have one of the following write capabilities:

### Write Ports
- `no`{.firrtl}: This port cannot write into the memory
- `no-mask`{.firrtl}: This port can only do full writes into the memory
- `with-mask`{.firrtl}: This port can do partial writes into the memory

If a memory is declared with element type `T`{.firrtl}, has a size less than or
equal to $2^N$, then its write ports have type:
equal to $2^N$, then a port with maximum capability (
`read => yes, write => with-mask`{.firrtl}) has type:

``` firrtl
{addr: UInt<N>, en: UInt<1>, clk: Clock, data: T, mask: M}
{
clk: Clock,
en: UInt<1>,
addr: UInt<N>,
wmode: UInt<1>,

flip rdata: T,
wdata: T,
wmask: M
}
```

where `M`{.firrtl} is the mask type calculated from the element type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the mask actually help anywhere? It seems it is a feature which really only makes sense assuming a certain compilation strategy (lowering aggregate memories to multiple memories, in which the mask becomes the enable flag).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is a abstraction over different possible hardware memory implementations (some SRAM allows bit or byte masks). Aggregated memory lowering might be part of memory synthesis, which one might want to delay until backend flow.

Also, this might be good for behavioral simulations performance?

`T`{.firrtl}. Intuitively, the mask type mirrors the aggregate structure of the
element type except with all ground types replaced with a single bit unsigned
integer type. The *non-masked portion* of the data value is defined as the set
of data value leaf sub-elements where the corresponding mask leaf sub-element is
high.
high, or the entire data value if no mask is present.

If the `en`{.firrtl} field is high, then the non-masked portion of the
`data`{.firrtl} field value is written, after the appropriate write latency, to
the location indicated by the `addr`{.firrtl} field. If the `en`{.firrtl} field
is low, then no value is written after the appropriate write latency. The port
is driven by the clock signal in the `clk`{.firrtl} field.
Some of those fields are absent if the capability is reduced. Their
functionalities and the condition of their presense are as followed:

### Readwrite Ports
- `clk`{.firrtl}: Always presents.

Finally, the readwrite ports have type:
The clock driving this port.
- `en`{.firrtl}: Always presents.

``` firrtl
{addr: UInt<N>, en: UInt<1>, clk: Clock, flip rdata: T, wmode: UInt<1>,
wdata: T, wmask: M}
```
When high, enables this port, and the read / write at that clock edge
initiates. Otherwise, no operation initiates at that clock edge.
- `addr`{.firrtl}: Always presents.

The address of a read / write operation at a certain clock edge.
- `wmode`{.firrtl}: Only presents if the read capability is `yes`{.firrtl}, and
the write capability is `no-mask`{.firrtl} or `with-mask`{.firrtl}.

This field decides whether a port having both read and write capability
functions as a read port or a write port at a certain clock edge. If
`en`{.firrtl} is high and `wmode`{.firrtl} is low, a read operation initiates.
If `en`{.firrtl} is high and `wmode`{.firrtl} is low, a write operation
initiates.
- `rdata`{.firrtl}: Only presents if the read capability is `yes`{firrtl}.

Reading this field gives the element value associated with the address of a
read operation that initiated `read-latency`{.firrtl} cycles prior on this
port. If no read operations initiated at that clock edge (including the case
that a write operation initiated), the value in this field is undefined.
- `wdata`{.firrtl}: Only presents if the write capability is `no-mask` or
`with-mask`.

When a write opartion initiates, the non-masked portion of the
`wdata`{.firrtl} field value is written, after the `write-latency` cycles, to
the location indicated by the address of the write operation.
- `wmask`{.firrtl}: Only presents if the write capability is
`with-mask`{.firrtl}.

A readwrite port is a single port that, on a given cycle, can be used either as
a read or a write port. If the readwrite port is not in write mode (the
`wmode`{.firrtl} field is low), then the `rdata`{.firrtl}, `addr`{.firrtl},
`en`{.firrtl}, and `clk`{.firrtl} fields constitute its read port fields, and
should be used accordingly. If the readwrite port is in write mode (the
`wmode`{.firrtl} field is high), then the `wdata`{.firrtl}, `wmask`{.firrtl},
`addr`{.firrtl}, `en`{.firrtl}, and `clk`{.firrtl} fields constitute its write
port fields, and should be used accordingly.
The value of this field acts as the mask for the write operation initiating
that cycle, if any.

### Read Under Write Behavior

Expand All @@ -1807,12 +1840,12 @@ memory after delaying the read address by the appropriate read latency.
If the read-under-write flag is set to `undefined`{.firrtl}, then the value held
by the read port after the appropriate read latency is undefined.

For the purpose of defining such collisions, an "active write port" is a write
port or a readwrite port that is used to initiate a write operation on a given
clock edge, where `en`{.firrtl} is set and, for a readwriter, `wmode`{.firrtl}
is set. An "active read port" is a read port or a readwrite port that is used to
For the purpose of defining such collisions, an "active write port" is a port
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph made more sense for a single clock. When each port can have a different clock, what is "a given clock edge"?

Copy link
Author

@CircuitCoder CircuitCoder Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point.

This description mostly came from the original read-under-write specification. It also has a separate paragraph after this one specific to independently clocked ports, which has been kept.

After a closer look at the original specification, it seems that it has a very imprecise definition of the semantics of a read or write. The first part of the original specification seems to imply that all ports share a single clock (a clock that's globally agreed on). Also writes to memories are atomic w.r.t. a globally-agreed memory content, and reads are also atomic.

There is also the problem of overlapping writes. The original spec completely omitted such possibility, since it modeled that writes happens atomically after write-latency.

  • If we would like to formally define the semantics of ALL possible overlapping transactions, it might be more convenient and precise to use the language of observability, similar to those used in defining memory orderings.
  • If we are happy with atomic writes, then we can just change mentions of "cycle" to "edge w.r.t. the port's clock", and everything will be fine ™️.

If we settled with the language of observability of read / write transactions, the original specification seems to intend to define the following semantics:

  • If the writer and the reader have the same clock, and there's an overlap timewise (w.r.t. wall time) between the write transaction and the read transaction:
    • If the read-under-write flag is old, and the read transaction's start is before the entire write transaction (w.r.t. wall time), the read SHOULD NOT observe the write. 1
    • If the read-under-write flag is new, and the read transaction's end is after the entire write transaction (w.r.t. wall time), the read SHOULD observe the write.
    • Otherwise, the result of the read is entirely undefined. This is stricter than saying that whether the read observe the write is undefined: it may read part of the write, or some other random bits out of nowhere.
  • If the writer and the reader have different clocks, and there's an overlap timewise (w.r.t. wall time) between the write transaction and the read transaction, the result of the read is entirely undefined.

Would this be a more precise definition of the read under write policy? If we decide to use this definition, we may keep a shorter version of the original paragraph as a more intuitive explanation of the aforementioned formal semantics.

Footnotes

  1. This description differs from the original spec in the sense that if a read starts inside a active write, the original spec says the read SHOULD NOT observe the write (the write happens atomically after write-latency). Here we change it to undefined.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not well versed in what real memories prefer to do and how they deal with some of these issues. Things like "what happens when you clock gate in the middle of a write window?" , "is there an internal clock all transactions are referenced too (e.g. 'wall-clock' time above)?". (Hopefully) Obviously firrtl semantics will pick a reasonable subset of all possible behaviors with an eye towards 90% user-case behavior.

Incidentally, talking to some FPGA folks, they classically build their own bypass logic around simpler memories to get the read-under-write behavior they want.

Copy link
Author

@CircuitCoder CircuitCoder May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sequencer What's your thoughts on these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically there is already an annotation designed for the custom port which makes MBIST work. However the annotation makes it directly export to Top level, this may satisfy an IP vendor, but for those who use chisel to design the full chip, this introduces additional problems.
We all agree that MBIST is necessary for ASIC designers. However it's always provided by vendors, thus a custom port connecting from each memory, and wire to the BIST controller is also necessary.
The main issue is we cannot model it in firrtl since for each memory IP vendor, their MBIST design might be different. Thus I think it's necessary to provide a custom port attribute in FIRRTL, but forbid it to generate a behavior model.
Sorry for the late reply...

with write capability that is used to initiate a write operation on a given
clock edge, where `en`{.firrtl} is set and, if presents, `wmode`{.firrtl}
is set. An "active read port" is a port with read capability that is used to
initiate a read operation on a given clock edge, where `en`{.firrtl} is set and,
for a readwriter, `wmode`{.firrtl} is not set. Each operation is defined to be
if presents, `wmode`{.firrtl} is not set. Each operation is defined to be
"active" for the number of cycles set by its corresponding latency, starting
from the cycle where its inputs were provided to its associated port. Note that
this excludes combinational reads, which are simply modeled as combinationally
Expand All @@ -1833,8 +1866,20 @@ same cycle, the stored value is undefined.
### Constant memory type

A memory with a constant data-type represents a ROM and may not have
write-ports. It is beyond the scope of this specification how ROMs are
initialized.
ports with write capability. It is beyond the scope of this specification how
ROMs are initialized.


### Custom port

Custom ports are intended for post synthesis flows that require memory instances
to have additional control signals. Behavorial simulators should ignore these
ports, in the following way:

- All input signals (into the memory) should be treated as dangling
wires.
- All output signals (from the memory) have unspecified (implementation defined)
values.

## Instances

Expand Down Expand Up @@ -3599,15 +3644,20 @@ ref_expr = ( "probe" | "rwprobe" ) , "(" , static_reference , ")"

(* Memory *)
ruw = ( "old" | "new" | "undefined" ) ;
read_cap = ( "no" | "yes" ) ;
write_cap = ( "no" | "no-mask" | "with-mask" ) ;
memory_port = "port" , id , ":" , [ info ] , newline , indent ,
"read" , "=>" , read_cap , newline ,
"write" , "=>" , write_cap , newline ,
[ "read-latency" , "=>" , int , newline ] ,
[ "write-latency" , "=>" , int , newline ] ,
dedent ;
memory = "mem" , id , ":" , [ info ] , newline , indent ,
"data-type" , "=>" , type , newline ,
"depth" , "=>" , int , newline ,
"read-latency" , "=>" , int , newline ,
"write-latency" , "=>" , int , newline ,
"read-under-write" , "=>" , ruw , newline ,
{ "reader" , "=>" , id , newline } ,
{ "writer" , "=>" , id , newline } ,
{ "readwriter" , "=>" , id , newline } ,
{ "port" , "=>" , memory_port , newline } ,
[ "custom-port" , "=>" , type , newline ] ,
dedent ;

(* Force and Release *)
Expand Down