[major] Fine-grind parameters and capabilities for memories #96

CircuitCoder · 2023-04-17T13:49:34Z

This PR changes memory statements in FIRRTL to allow more fine-grind control over parameters on memories, allowing:

Per-port read / write latency
Write ports with no masks

In short, a memory in the syntax proposed in this PR would look like this:

mem mymem:
  data-type => {real:SInt<16>, imag:SInt<16>}
  depth => 256
  read-under-write => undefined
  port r1:
    read => yes
    write => no
    read-latency => 0
  port r2:
    read => yes
    write => no
    read-latency => 0
  port w:
    read => no
    write => with-mask
    write-latency => 1

Note that this will be a breaking change. Although it would be easy to make the old port declaration syntax (e.g. reader => r1 ) a shorthand for the new ones, signal names is also different in this new approach (e.g. wdata vs data for writers). It would add a lot of burden to the spec to have different signal names in ports with different sets of capabilities.

This is a draft PR. The ideas presented here is still in early design stage, ~~and we've not yet added changelogs for it.~~

Pending problems

Port-pairwise read-under-write policy

In theory, there might be memories that have different read-under-write policies between different pairs of ports. If we would like to characterize such behavior, we would need to allow specifying RUW policies for all (ordered) pairs of ports in the form of, let's say, a matrix.

Though possible, we believe this would be a highly uncommon scenario. So another way of settling this is to allow a special value indicating a non-existing or non-standard global read-under-write policy for the memory instance, and allow compilers and tools to have their own implementation-dependent way of specifying those policies:

mem m1:
  read-under-write: non-standard

Another way is to simply provide no support such memories. Users would need to resort to using extmodules.

Alternative read / write latency syntax

Currently from a pure syntactical POV, we doesn't enforce ports that have read capability to have a read-latency settings. We may make this requirement purely syntactical by requiring users to provide latencies when specifying capabilities, e.g.:

mem m1:
  port rw1:
    read: yes, 1
    write: no-mask, 2

Backward compatibility

To make this new syntax backward compatible, we would have to:

Make the old reader => r1 a shorthand for the new syntax.
Allow read-latency and write-latency in global scope, and specify the validity and semantics of ports that have no read / write latency specified, or two incompatible values specified.
Make ports with different capabilities have different signal names.

sequencer · 2023-04-17T22:56:52Z

We may need an additional custom port for mbist/dft. This port will be ignored in the behavioral simulation. But exist in the post synthesis flow.

CircuitCoder · 2023-04-23T07:26:05Z

@sequencer Is it sufficient to have a extra-type in port instantiations, which will result in a corresponding signal generated in the port?

mem m1:
  port r1:
    extra-type => {meow:SInt<16>, meowmeow:SInt<16>}
  // ...

m1.r1.extra

darthscsi · 2023-04-25T03:07:56Z

spec.md

+    read => no
+    write => with-mask
+    write-latency => 1
+  custom-port => {a:UInt<4>, flip b:UInt<2>}


Shouldn't this match syntax more? Something like:
port custom: {a:UInt<4>, flip b:UInt<2>}

Indeed. Maybe we should use another keyward (e.g. custom-port) to avoid syntax ambiguity when parsing:

custom-port custom_a => {a:Uint<4>, flip b:Uint<2>}

The motivation behind the custom-port => syntax is that there should only be at most one custom port for a single memory. But this may cause implicit naming collision with other ports. So let's also include the port name here.

Using => is to keep in line with other single-value configuration. See below

Updated in e6311a7 to use custom-port custom => {a:UInt<4>, flip b:UInt<2>}

Why should there be at most one custom port?

With the custom-port custom => ... syntax, this limitation is lifted, and there can be multiple custom ports now. However it might still be desirable.

If people requires multiple custom ports, they can always aggregate them into a single one. Limiting the number of custom ports may simplify compilers.

darthscsi · 2023-04-25T03:08:23Z

spec.md

-    corresponding element within the memory holds the new value.
-
-6.  A read-under-write flag indicating the behavior when a memory location is
+3.  A variable number of named ports, each having following parameters:


"each having the following parameters in this order:"

Fixed in e6311a7.

darthscsi · 2023-04-25T03:10:41Z

spec.md

  read-under-write => undefined
+  port r1:


It's weird to mix => syntax and : syntax.

The motivation of using : is to distinguish a indented block from a single configuration value. Using a different delimitator might make writing parser a little bit easier?

darthscsi · 2023-04-25T03:12:47Z

spec.md

+  flip rdata: T,
+  wdata: T,
+  wmask: M
+}
 ```

 where `M`{.firrtl} is the mask type calculated from the element type


Does the mask actually help anywhere? It seems it is a feature which really only makes sense assuming a certain compilation strategy (lowering aggregate memories to multiple memories, in which the mask becomes the enable flag).

IMO, this is a abstraction over different possible hardware memory implementations (some SRAM allows bit or byte masks). Aggregated memory lowering might be part of memory synthesis, which one might want to delay until backend flow.

Also, this might be good for behavioral simulations performance?

darthscsi · 2023-04-25T03:14:02Z

spec.md

-port or a readwrite port that is used to initiate a write operation on a given
-clock edge, where `en`{.firrtl} is set and, for a readwriter, `wmode`{.firrtl}
-is set. An "active read port" is a read port or a readwrite port that is used to
+For the purpose of defining such collisions, an "active write port" is a port


This paragraph made more sense for a single clock. When each port can have a different clock, what is "a given clock edge"?

That's a great point.

This description mostly came from the original read-under-write specification. It also has a separate paragraph after this one specific to independently clocked ports, which has been kept.

After a closer look at the original specification, it seems that it has a very imprecise definition of the semantics of a read or write. The first part of the original specification seems to imply that all ports share a single clock (a clock that's globally agreed on). Also writes to memories are atomic w.r.t. a globally-agreed memory content, and reads are also atomic.

There is also the problem of overlapping writes. The original spec completely omitted such possibility, since it modeled that writes happens atomically after write-latency.

If we would like to formally define the semantics of ALL possible overlapping transactions, it might be more convenient and precise to use the language of observability, similar to those used in defining memory orderings.

If we are happy with atomic writes, then we can just change mentions of "cycle" to "edge w.r.t. the port's clock", and everything will be fine ™️.

If we settled with the language of observability of read / write transactions, the original specification seems to intend to define the following semantics:

If the writer and the reader have the same clock, and there's an overlap timewise (w.r.t. wall time) between the write transaction and the read transaction:

If the read-under-write flag is old, and the read transaction's start is before the entire write transaction (w.r.t. wall time), the read SHOULD NOT observe the write. ¹

If the read-under-write flag is new, and the read transaction's end is after the entire write transaction (w.r.t. wall time), the read SHOULD observe the write.

Otherwise, the result of the read is entirely undefined. This is stricter than saying that whether the read observe the write is undefined: it may read part of the write, or some other random bits out of nowhere.

If the writer and the reader have different clocks, and there's an overlap timewise (w.r.t. wall time) between the write transaction and the read transaction, the result of the read is entirely undefined.

Would this be a more precise definition of the read under write policy? If we decide to use this definition, we may keep a shorter version of the original paragraph as a more intuitive explanation of the aforementioned formal semantics.

Footnotes

This description differs from the original spec in the sense that if a read starts inside a active write, the original spec says the read SHOULD NOT observe the write (the write happens atomically after write-latency). Here we change it to undefined. ↩

I'm not well versed in what real memories prefer to do and how they deal with some of these issues. Things like "what happens when you clock gate in the middle of a write window?" , "is there an internal clock all transactions are referenced too (e.g. 'wall-clock' time above)?". (Hopefully) Obviously firrtl semantics will pick a reasonable subset of all possible behaviors with an eye towards 90% user-case behavior.

Incidentally, talking to some FPGA folks, they classically build their own bypass logic around simpler memories to get the read-under-write behavior they want.

@sequencer What's your thoughts on these?

So basically there is already an annotation designed for the custom port which makes MBIST work. However the annotation makes it directly export to Top level, this may satisfy an IP vendor, but for those who use chisel to design the full chip, this introduces additional problems.
We all agree that MBIST is necessary for ASIC designers. However it's always provided by vendors, thus a custom port connecting from each memory, and wire to the BIST controller is also necessary.
The main issue is we cannot model it in firrtl since for each memory IP vendor, their MBIST design might be different. Thus I think it's necessary to provide a custom port attribute in FIRRTL, but forbid it to generate a behavior model.
Sorry for the late reply...

sequencer · 2023-04-25T05:34:12Z

another thing to mention is: it's not possible to emulate the behavior for different DFT/MBIST implementation. I think it's necessary to forbid compiler to lower behavior memory with the custom port being configured.

CircuitCoder · 2023-04-25T11:31:43Z

@sequencer Then it's more convenient to define the behavior of custom ports during behavioral simulations to be completely undefined.

Updated in e6311a7

CircuitCoder added 2 commits April 13, 2023 19:42

WIP: Structural memory

db8f286

Syntax fixes

9f1ba2f

CircuitCoder marked this pull request as ready for review April 21, 2023 07:31

CircuitCoder changed the title ~~Fine-grind parameters and capabilities for memories~~ [major] Fine-grind parameters and capabilities for memories Apr 23, 2023

CircuitCoder added 3 commits April 24, 2023 03:32

Add changes to revision history

98c815e

Addressed custom ports in memories

72007fe

Merge remote-tracking branch 'upstream/main'

c5b6867

darthscsi reviewed Apr 25, 2023

View reviewed changes

Various fixes regarding the new memory statment

e6311a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[major] Fine-grind parameters and capabilities for memories #96

[major] Fine-grind parameters and capabilities for memories #96

CircuitCoder commented Apr 17, 2023 •

edited

Loading

sequencer commented Apr 17, 2023

CircuitCoder commented Apr 23, 2023

darthscsi Apr 25, 2023

CircuitCoder Apr 25, 2023 •

edited

Loading

CircuitCoder Apr 25, 2023

darthscsi Apr 28, 2023

CircuitCoder May 2, 2023 •

edited

Loading

darthscsi Apr 25, 2023

CircuitCoder Apr 25, 2023

CircuitCoder Apr 25, 2023 •

edited

Loading

darthscsi Apr 25, 2023

CircuitCoder Apr 25, 2023

darthscsi Apr 25, 2023

CircuitCoder Apr 25, 2023

darthscsi Apr 25, 2023

CircuitCoder Apr 25, 2023 •

edited

Loading

darthscsi Apr 28, 2023

CircuitCoder May 2, 2023 •

edited

Loading

sequencer May 17, 2023

sequencer commented Apr 25, 2023 •

edited

Loading

CircuitCoder commented Apr 25, 2023

[major] Fine-grind parameters and capabilities for memories #96

Are you sure you want to change the base?

[major] Fine-grind parameters and capabilities for memories #96

Conversation

CircuitCoder commented Apr 17, 2023 • edited Loading

Pending problems

Port-pairwise read-under-write policy

Alternative read / write latency syntax

Backward compatibility

sequencer commented Apr 17, 2023

CircuitCoder commented Apr 23, 2023

Choose a reason for hiding this comment

CircuitCoder Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CircuitCoder May 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CircuitCoder Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CircuitCoder Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

CircuitCoder May 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sequencer commented Apr 25, 2023 • edited Loading

CircuitCoder commented Apr 25, 2023

CircuitCoder commented Apr 17, 2023 •

edited

Loading

CircuitCoder Apr 25, 2023 •

edited

Loading

CircuitCoder May 2, 2023 •

edited

Loading

CircuitCoder Apr 25, 2023 •

edited

Loading

CircuitCoder Apr 25, 2023 •

edited

Loading

CircuitCoder May 2, 2023 •

edited

Loading

sequencer commented Apr 25, 2023 •

edited

Loading