diff --git a/src/c-st-ext.adoc b/src/c-st-ext.adoc index 97aca5fae..6bac419be 100644 --- a/src/c-st-ext.adoc +++ b/src/c-st-ext.adoc @@ -49,7 +49,7 @@ and/or D) is also implemented. In addition, RV32C includes a compressed jump and link instruction to compress short-range subroutine calls, where the same opcode is used to compress ADDIW for RV64C and RV128C. -[TIP] +[NOTE] ==== Double-precision loads and stores are a significant fraction of static and dynamic instructions, hence the motivation to include them in the @@ -100,7 +100,7 @@ instructions in one C instruction. It is important to note that the C extension is not designed to be a stand-alone ISA, and is meant to be used alongside a base ISA. -[TIP] +[NOTE] ==== Variable-length instruction sets have long been used to improve code density. For example, the IBM Stretch cite:[stretch], developed in the late 1950s, had @@ -526,7 +526,7 @@ latexmath:[$\textit{rs1}{\neq}\texttt{x0}$]; the code point with latexmath:[$\textit{rs1}{=}\texttt{x0}$] corresponds to the C.EBREAK instruction. -[TIP] +[NOTE] ==== Strictly speaking, C.JALR does not expand exactly to a base RVI instruction as the value added to the PC to form the link address is 2 @@ -707,7 +707,7 @@ C.MV copies the value in register _rs2_ into register _rd_. C.MV expands into `add rd, x0, rs2`. C.MV is only valid when `rs2≠x0` the code points with `rs2=x0` correspond to the C.JR instruction. The code points with `rs2≠x0` and `rd=x0` are HINTs. -[TIP] +[NOTE] ==== _C.MV expands to a different instruction than the canonical MV pseudoinstruction, which instead uses ADDI. Implementations that handle diff --git a/src/counters.adoc b/src/counters.adoc index f4a34af49..7f036b0d8 100644 --- a/src/counters.adoc +++ b/src/counters.adoc @@ -14,7 +14,7 @@ counters (CYCLE, TIME, and INSTRET), which have dedicated functions (cycle count, real-time clock, and instructions retired, respectively). The Zicntr extension depends on the Zicsr extension. -[TIP] +[NOTE] ==== We recommend provision of these basic counters in implementations as they are essential for basic performance analysis, adaptive and dynamic @@ -35,7 +35,7 @@ the full 64-bit CSRs directly. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the `cycle`, `time`, and `instret` counters. -[TIP] +[NOTE] ==== The counter pseudoinstructions are mapped to the read-only `csrrs rd, counter, x0` canonical form, but the other read-only CSR @@ -47,7 +47,7 @@ For base ISAs with XLEN=32, the Zicntr extension enables the three RDTIME, and RDINSTRET pseudoinstructions provide the lower 32 bits, and the RDCYCLEH, RDTIMEH, and RDINSTRETH pseudoinstructions provide the upper 32 bits of the respective counters. -[TIP] +[NOTE] ==== We required the counters be 64 bits wide, even when XLEN=32, as otherwise it is very difficult for software to determine if values have @@ -67,7 +67,7 @@ overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing. -[TIP] +[NOTE] ==== RDCYCLE is intended to return the number of cycles executed by the processor core, not the hart. Precisely defining what is a "core" is @@ -128,7 +128,7 @@ should be constant within a small error bound. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods). -[TIP] +[NOTE] ==== On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return @@ -141,7 +141,7 @@ bound should be set based on the requirements of the platform. The real-time clocks of all harts must be synchronized to within one tick of the real-time clock. -[TIP] +[NOTE] ==== As with other architectural mandates, it suffices to appear "as if" harts are synchronized to within one tick of the real-time clock, i.e., @@ -154,7 +154,7 @@ hart from some arbitrary start point in the past. RDINSTRETH is only present when XLEN=32 and reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice. -[TIP] +[NOTE] ==== Instructions that cause synchronous exceptions, including ECALL and EBREAK, are not considered to retire and hence do not increment the @@ -180,7 +180,7 @@ hardware performance counters, `hpmcounter3-hpmcounter31`. When XLEN=32, the upper 32 bits of these performance counters are accessible via additional CSRs `hpmcounter3h- hpmcounter31h`. The Zihpm extension depends on the Zicsr extension. -[TIP] +[NOTE] ==== In some applications, it is important to be able to read multiple counters at the same instant in time. When run under a multitasking @@ -202,7 +202,7 @@ exception or may return a constant value. The execution environment should provide a means to determine the number and width of the implemented counters, and an interface to configure the events to be counted by each counter. -[TIP] +[NOTE] ==== For execution environments implemented on RISC-V privileged platforms, the privileged architecture manual describes privileged CSRs controlling diff --git a/src/d-st-ext.adoc b/src/d-st-ext.adoc index 7c5eb4c1c..e843ac1bc 100644 --- a/src/d-st-ext.adoc +++ b/src/d-st-ext.adoc @@ -58,7 +58,7 @@ so, the _n_ least-significant bits of the input are used as the input value, otherwise the input value is treated as an _n_-bit canonical NaN. -[TIP] +[NOTE] ==== Earlier versions of this document did not define the behavior of feeding the results of narrower or wider operands into an operation, except to @@ -184,7 +184,7 @@ include::images/wavedrom/d-xwwx.adoc[] [[fmvxddx]] //.Double-precision float move to _rd_ -[TIP] +[NOTE] ==== Early versions of the RISC-V ISA had additional instructions to allow RV32 systems to transfer between the upper and lower portions of a diff --git a/src/f-st-ext.adoc b/src/f-st-ext.adoc index 96d5b44b6..739ecf2c3 100644 --- a/src/f-st-ext.adoc +++ b/src/f-st-ext.adoc @@ -21,7 +21,7 @@ instructions operate on values in the floating-point register file. Floating-point load and store instructions transfer floating-point values between registers and memory. Instructions to transfer values to and from the integer register file are also provided. -[TIP] +[NOTE] ==== We considered a unified register file for both integer and floating-point values as this simplifies software register allocation @@ -189,7 +189,7 @@ quiet bit. For single-precision floating-point, this corresponds to the pattern (((NaN, generation))) (((NaN, propagation))) -[TIP] +[NOTE] ==== We considered propagating NaN payloads, as is recommended by the standard, but this decision would have increased hardware cost. @@ -432,7 +432,7 @@ include::images/wavedrom/spfloat-mv.adoc[] [[spfloat-mv]] //.SP floating point move -[TIP] +[NOTE] ==== The base floating-point ISA was defined so as to allow implementations to employ an internal recoding of the floating-point format in registers to simplify handling of subnormal values and possibly to reduce functional unit latency. To this end, the F extension avoids diff --git a/src/intro.adoc b/src/intro.adoc index 6fc871b0e..6390f601e 100644 --- a/src/intro.adoc +++ b/src/intro.adoc @@ -33,7 +33,7 @@ efficiency. * An ISA that simplifies experiments with new privileged architecture designs. -[TIP] +[NOTE] ==== Commentary on our design decisions is formatted as in this paragraph. This non-normative text can be skipped if the reader is only interested @@ -64,7 +64,7 @@ volume provides the design of the first ("classic") privileged architecture. The manuals use IEC 80000-13:2008 conventions, with a byte of 8 bits. -[TIP] +[NOTE] ==== In the unprivileged ISA design, we tried to remove any dependence on particular microarchitectural features, such as cache line size, or on @@ -144,7 +144,7 @@ environments for guest operating systems. harts on an underlying x86 system, and which can provide either a user-level or a supervisor-level execution environment. -[TIP] +[NOTE] ==== A bare hardware platform can be considered to define an EEI, where the accessible harts, memory, and other devices populate the environment, @@ -176,7 +176,7 @@ constitute forward progress: * Any other event defined by an extension to constitute forward progress. -[TIP] +[NOTE] ==== The term hart was introduced in the work on Lithe cite:[lithe-pan-hotpar09] and cite:[lithe-pan-pldi10] to provide a term to represent an abstract execution resource as opposed to a software thread @@ -230,7 +230,7 @@ base integer instruction set supporting a flat 128-bit address space representation for signed integer values. -[TIP] +[NOTE] ==== Although 64-bit address spaces are a requirement for larger systems, we believe 32-bit address spaces will remain adequate for many embedded and @@ -382,7 +382,7 @@ harts may be entirely the same, or entirely different, or may be partly different but sharing some subset of resources, mapped into the same or different address ranges. -[TIP] +[NOTE] ==== For a purely "bare metal" environment, all harts may see an identical address space, accessed entirely by physical addresses. However, when @@ -552,7 +552,7 @@ instructions. These instructions are considered to be of minimal length: bits. The encoding with bits [ILEN-1:0] all ones is also illegal; this instruction is considered to be ILEN bits long. -[TIP] +[NOTE] ==== We consider it a feature that any length of instruction containing all zero bits is not legal, as this quickly traps erroneous jumps into @@ -587,7 +587,7 @@ instruction specification. (((bi-endian))) (((endian, bi-))) -[TIP] +[NOTE] ==== We originally chose little-endian byte ordering for the RISC-V memory system because little-endian systems are currently dominant commercially diff --git a/src/m-st-ext.adoc b/src/m-st-ext.adoc index fc08be2de..1e09b7ac4 100644 --- a/src/m-st-ext.adoc +++ b/src/m-st-ext.adoc @@ -5,7 +5,7 @@ This chapter describes the standard integer multiplication and division instruction extension, which is named "M" and contains instructions that multiply or divide values held in two integer registers. -[TIP] +[NOTE] ==== We separate integer multiply and divide out from the base to simplify low-end implementations, or for applications where integer multiply and @@ -113,7 +113,7 @@ latexmath:[$-1$] |latexmath:[$2^{L}-1$] + //|Overflow (signed only) |latexmath:[$-2^{L-1}$] |latexmath:[$-1$] |– |– |latexmath:[$-2^{L-1}$] |0 //|=== -[TIP] +[NOTE] ==== We considered raising exceptions on integer divide by zero, with these exceptions causing a trap in most execution environments. However, this diff --git a/src/rv128.adoc b/src/rv128.adoc index 62af10948..9098dcb00 100644 --- a/src/rv128.adoc +++ b/src/rv128.adoc @@ -11,7 +11,7 @@ flat 128-bit address space. The variant is a straightforward extrapolation of the existing RV32I and RV64I designs. (((RV128, design))) -[TIP] +[NOTE] ==== The primary reason to extend integer register width is to support larger address spaces. It is not clear when a flat address space larger than 64 diff --git a/src/rv32.adoc b/src/rv32.adoc index 9714df4fa..fac99092a 100644 --- a/src/rv32.adoc +++ b/src/rv32.adoc @@ -3,7 +3,7 @@ This chapter describes the RV32I base integer instruction set. -[TIP] +[NOTE] ==== RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed @@ -258,7 +258,7 @@ destination is register _rd_ for both register-immediate and register-register instructions. No integer computational instructions cause arithmetic exceptions. -[TIP] +[NOTE] ==== We did not include special instruction-set support for overflow checks on integer arithmetic operations in the base instruction set, as many @@ -581,7 +581,7 @@ a conditional branch instruction with an always-true condition. RISC-V jumps are also PC-relative and support a much wider offset range than branches, and will not pollute conditional-branch prediction tables. -[TIP] +[NOTE] ==== The conditional branches were designed to include arithmetic comparison operations between two registers (as also done in PA-RISC, Xtensa, and @@ -666,7 +666,7 @@ even though the load value is discarded. The EEI will define whether the memory system is little-endian or big-endian. In RISC-V, endianness is byte-address invariant. -[TIP] +[NOTE] ==== In a system for which endianness is byte-address invariant, the following property holds: if a byte is stored to memory at some address @@ -731,7 +731,7 @@ by address misalignment result in a contained trap (allowing software running inside the execution environment to handle the trap) or a fatal trap (terminating execution). -[TIP] +[NOTE] ==== Misaligned accesses are occasionally required when porting legacy code, and help performance on applications when using any form of packed-SIMD @@ -853,7 +853,7 @@ Base implementations shall treat all such reserved configurations as normal fences with _fm_=0000, and standard software shall use only non-reserved configurations. -[TIP] +[NOTE] ==== We chose a relaxed memory model to allow high performance from simple machine implementations and from likely future coprocessor or @@ -875,7 +875,7 @@ described in <>, and the base unprivileged instructions are described in the following section. -[TIP] +[NOTE] ==== The SYSTEM instructions are defined to allow simpler implementations to always trap to a single software trap handler. More sophisticated @@ -906,7 +906,7 @@ to reflect that they can be used more generally than to call a supervisor-level operating system or debugger. ==== -[TIP] +[NOTE] ==== EBREAK was primarily designed to be used by a debugger to cause execution to stop and fall back into the debugger. EBREAK is also used @@ -986,7 +986,7 @@ HINT space is reserved for standard HINTs. The remainder of the HINT space is designated for custom HINTs: no standard HINTs will ever be defined in this subspace. -[TIP] +[NOTE] ==== We anticipate standard hints to eventually include memory-system spatial and temporal locality hints, branch prediction hints, thread-scheduling diff --git a/src/rv32e.adoc b/src/rv32e.adoc index c30b598f8..35c996fe3 100644 --- a/src/rv32e.adoc +++ b/src/rv32e.adoc @@ -22,7 +22,7 @@ RV64I are also compatible with RV32E and RV64E, respectively. RV32E and RV64E reduce the integer register count to 16 general-purpose registers, (`x0-x15`), where `x0` is a dedicated zero register. -[TIP] +[NOTE] ==== We have found that in the small RV32I core implementations, the upper 16 registers consume around one quarter of the total area of the core diff --git a/src/scalar-crypto.adoc b/src/scalar-crypto.adoc index 63064dce5..d670a178c 100644 --- a/src/scalar-crypto.adoc +++ b/src/scalar-crypto.adoc @@ -2313,7 +2313,7 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma0 transform for SHA2-512 may be computed on RV32 @@ -2387,7 +2387,7 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma0 transform for SHA2-512 may be computed on RV32 @@ -2461,7 +2461,7 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma1 transform for SHA2-512 may be computed on RV32 @@ -2535,7 +2535,7 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma1 transform for SHA2-512 may be computed on RV32 @@ -2608,7 +2608,7 @@ is represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sum0 transform for SHA2-512 may be computed on RV32 @@ -2682,7 +2682,7 @@ is represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sum1 transform for SHA2-512 may be computed on RV32 diff --git a/src/supervisor.adoc b/src/supervisor.adoc index d79d7335f..d6d52b1f6 100644 --- a/src/supervisor.adoc +++ b/src/supervisor.adoc @@ -1183,7 +1183,7 @@ without the need to execute an SFENCE.VMA instruction. Changing immediately, without the need to execute an SFENCE.VMA instruction. Likewise, changes to `satp`.ASID take effect immediately. -[TIP] +[NOTE] ==== The following common situations typically require executing an SFENCE.VMA instruction: diff --git a/src/vector-crypto.adoc b/src/vector-crypto.adoc index 695a46a08..326bb1301 100644 --- a/src/vector-crypto.adoc +++ b/src/vector-crypto.adoc @@ -3190,7 +3190,7 @@ next state. // output is the new values of _a, b, e_ and _f_ after performing 2 rounds of the hash // computation. The new values, _c_, _d_, _g_, and _h_, are equal to the input values for _a_, _b_, // _e_, _f_ respectively. -// [TIP] +// [NOTE] // .Note to software developers // ==== // The MessageSchedplus constant input to this instruction is generated by Software @@ -3198,7 +3198,7 @@ next state. // round constant as defined in the NIST specification (see <>). // ==== -[TIP] +[NOTE] .Note to software developers ==== The NIST standard (see <>) requires the final hash to be in big-endian byte ordering @@ -3217,7 +3217,7 @@ Having a high and low version of this instruction typically improves performance interleaving independent hashing operations (i.e., when hashing several files at once). ==== -// [TIP] +// [NOTE] // .Note to software developers // ==== // These instructions take in two SEW words _W1_ and _W0_ which are the next two words of the message @@ -3378,7 +3378,7 @@ Eleven of the last 16 `SEW`-sized message-schedule words from `vd` (oldest), `vs and `vs1` (most recent) are processed to produce the next 4 message-schedule words. -[TIP] +[NOTE] .Note to software developers ==== The first 16 SEW-sized words of the message schedule come from the _message block_ @@ -3389,7 +3389,7 @@ All of the subsequent message schedule words are produced by this instruction an therefore do not require an endian swap. ==== -[TIP] +[NOTE] .Note to software developers ==== Software is required to pack the words into element groups @@ -3419,7 +3419,7 @@ lower indices indicating older words. // {W~11~, W~10~, W~9~, W~4~} + // {W~15~, W~14~, W~13~, W~12~}` -[TIP] +[NOTE] .Note to software developers ==== The {W~11~, W~10~, W~9~, W~4~} element group can easily be formed by using a vector diff --git a/src/zfa.adoc b/src/zfa.adoc index 942aeef6f..20223d8db 100644 --- a/src/zfa.adoc +++ b/src/zfa.adoc @@ -57,13 +57,13 @@ like FMV.W.X, but with _rs2_=1. |31 |_Canonical NaN_ |`0` |`11111111` |`100...000` |=== -[TIP] +[NOTE] ==== The preferred assembly syntax for entries 1, 30, and 31 is `min`, `inf`, and `nan`, respectively. For entries 0 through 29 (including entry 1), the assembler will accept decimal constants in C-like syntax. ==== -[TIP] +[NOTE] ==== The set of 32 constants was chosen by examining floating-point libraries, including the C standard math library, and to optimize @@ -170,7 +170,7 @@ FCVT.W.D with the same input operand. This instruction is only provided if the D extension is implemented. It is encoded like FCVT.W.D, but with the rs2 field set to 8 and the _rm_ field set to 1 (RTZ). Other _rm_ values are _reserved_. -[TIP] +[NOTE] ==== The assembly syntax requires the RTZ rounding mode to be explicitly specified, i.e., `fcvtmod.w.d rd, rs1, rtz`. diff --git a/src/zifencei.adoc b/src/zifencei.adoc index 666effb93..2047346eb 100644 --- a/src/zifencei.adoc +++ b/src/zifencei.adoc @@ -17,7 +17,7 @@ snooping/invalidation overhead by writing translated instructions to memory regions that are known not to reside in the I-cache. ==== ''' -[TIP] +[NOTE] ==== The FENCE.I instruction was designed to support a wide variety of implementations. A simple implementation can flush the local instruction diff --git a/src/zihintntl.adoc b/src/zihintntl.adoc index 8e225cb52..7ddbb4b9d 100644 --- a/src/zihintntl.adoc +++ b/src/zihintntl.adoc @@ -178,7 +178,7 @@ preferentially take the interrupt before the NTL, rather than between the NTL and the memory access. ==== ''' -[TIP] +[NOTE] ==== Since the NTL instructions are encoded as ADDs, they can be used within LR/SC loops without voiding the forward-progress guarantee. But, since