Home

Awesome

Universal Memory Interface (UMI)

1. Introduction

1.1 Design Philosophy

1.2 Architecture

The Universal Memory Interface (UMI) is a transaction based standard for accessing memory through request-response message exchange patterns. UMI includes five distinct abstraction layers:

UMI

1.3 Key Features

1.4 Key Terms


2. Protocol UMI (PUMI) Layer

UMI transaction payloads are treated as a series of opaque bytes and can carry arbitrary data, including higher level protocols. The maximum data size available for communication protocol data and headers is 32,768 bytes. The following table illustrates recommended bit packing for a number of common communication standards.

ProtocolPayload(UMI DATA)Header(UMI Data)UMI Addresses + Command
Ethernet64B - 1,518B14B20B
CXL-6864B2B20B
CXL-256254B2B20B

3. Transaction UMI (TUMI) Layer

3.1 Theory of Operation

UMI transactions are request-response memory exchanges between Hosts and Devices. Hosts send memory access requests to devices and get responses back. The figure below illustrates the relationship between hosts, devices, and the interconnect network.

UMI

Basic UMI read/write transaction involves the transfer of LEN+1 words of data of width 2^SIZE bytes between a device and a host.

Summary:

Hosts:

Devices:

Constraints:

3.2 Message Format

3.2.1 Message Fields

TermMeaning
CMDCommand (type + options)
DADestination address of message
SASource address (where to return a response)
DATAData payload
OPCODECommand opcode
SIZEWord size
LENWord transfers per message
QOSQuality of service required
PROTProtection mode
EXExclusive access indicator
EOFEnd of frame indicator
EOMEnd of message indicator
UUser defined message bit
RReserved message bit
ERRError code
HOSTIDHost ID
DEVIDDevice ID
MSBMost significant bit

3.2.2 Message Byte Order

Request and response messages are packed together in the following order:

MSB-1:160159:9695:3231:0
64b architectureDATASADACMD
32b architectureDATADATASA,DACMD

3.2.3 Message Types

The table below documents all UMI message types. CMD[4:0] is the UMI opcode defining the type of message being sent. CMD[31:5] are used for message specific options. Complete functional descriptions of each message can be found in the Message Description Section.

MessageDATASADA31:2726:2524:2221:2019:1615:87:54:0
INVALID------------0x00,0x0
REQ_RDYYHOSTIDUEX,EOF,EOMPROTQOSLENSIZER,0x1
REQ_WRYYYHOSTIDUEX,EOF,EOMPROTQOSLENSIZER,0x3
REQ_WRPOSTEDYYYHOSTIDU0 ,EOF,EOMPROTQOSLENSIZER,0x5
REQ_RDMAYYHOSTIDU0 ,EOF,EOMPROTQOSLENSIZER,0x7
REQ_ATOMICYYYHOSTIDU0 ,EOF,EOMPROTQOSATYPESIZER,0x9
REQ_USER0YYYHOSTIDUEX,EOF,EOMPROTQOSLENSIZER,0xB
REQ_FUTURE0YYYHOSTIDUEX,EOF,EOMPROTQOSLENSIZER,0xD
REQ_ERRORYYHOSTIDUUUUU0x0R,0xF
REQ_LINKUUUUUU0x1R,0xF
RESP_RDYYHOSTIDERREX,EOF,EOMPROTQOSLENSIZER,0x2
RESP_WRYHOSTIDERREX,EOF,EOMPROTQOSLENSIZER,0x4
RESP_USER0YHOSTIDERREX,EOF,EOMPROTQOSLENSIZER,0x6
RESP_USER1YYHOSTIDERREX,EOF,EOMPROTQOSLENSIZER,0x8
RESP_FUTURE0YHOSTIDERREX,EOF,EOMPROTQOSLENSIZER,0xA
RESP_FUTURE1YYHOSTIDERREX,EOF,EOMPROTQOSLENSIZER,0xC
RESP_LINKUUUUUU0x0R,0xE

3.3 Message Fields

3.3.1 Source Address and Destination Address (SA[63:0], DA[63:0])

The destination address (DA) specifies the target address of a request or response message. For requests, the DA field is the full device address to access. For responses, the DA field returned is a copy of the requester SA field. The SA field can be a full address (32/64 bits) or a partial routing address and a set of optional UMI signal layer controls needed to drive the interconnect network.

Responses do not have the SA field. At the SUMI level, while the SA bus is always present, its value is undefined in response packets. Implementations must not depend on the value of the SA bus in response packets.

The table below shows the bit mapping for SA field.

SA63:5655:4847:4039:3231:2423:1615:87:0
64b modeRRRUUUUU
32b mode--------RUUU

3.3.2 Transaction Word Size (SIZE[2:0])

The SIZE field defines the number of bytes in a transaction word. Devices are not required to support all SIZE options. Hosts must only send messages with a SIZE supported by the target device.

SIZE[2:0]Bytes per word
0b0001
0b0012
0b0104
0b0118
0b10016
0b10132
0b11064
0b111128

3.3.3 Transaction Length (LEN[7:0])

The LEN field defines the number of words of size 2^SIZE bytes transferred by a transaction. The number of transfers is equal to LEN + 1, equating to a range of 1-256 transfers per transaction. The current address of transfer number 'i' in a transaction is defined by:

ADDR_i = START_ADDR + (i-1) * 2^SIZE.

3.3.4 Protection Mode (PROT[1:0])

The PROT field indicates the protected access level of the transaction, enabling controlled access to memory.

PROT[Bit]ValueFunction
[0]0Unprivileged access
1Privileged access
[1]0Secure access
1Non-secure access

3.3.5 Quality of Service (QOS[3:0])

The QOS field controls the quality of service required from the interconnect network. The interpretation of the QOS bits is interconnect network specific.

3.3.6 End of Message (EOM)

The EOM bit is reserved for UMI signal layer and is used to track the transfer of the last word in a message.

3.3.7 End of Frame (EOF)

The EOF bit can be used to indicate the last message in a sequence of related UMI transactions. Use of the EOF bit at an endpoint is optional and implementation specific.

3.3.8 Exclusive Access (EX)

The EX field is used to indicate exclusive access to an address. The function is used to enable atomic load-store exchanges. The sequence of operations is:

  1. Host sends a REQ_RD to address A (with EX=1) with SA B
  2. Host sends a REQ_WR to address A (with EX=1) with SA B
  3. Device:
    1. If address A has NOT been modified by another host (i.e., write with a different SA) since last exclusive read, device performs write to address A and returns ERR = 0b01 in RESP_WR to host.
    2. If address A has been modified by another host since last exclusive read, device returns a ERR = 0b00 in RESP_WR to host and does not perform write to address A.

3.3.9 Error Code (ERR[1:0])

The ERR field indicates the error status of a response (RESP_WR, RESP_RD) transaction.

ERR[1:0]Meaning
0b00OK (no error)
0b01EXOK (successful exclusive access)
0b10DEVERR (device error)
0b11NETERR (network error)

DEVERR trigger examples:

NETERR trigger examples:

3.3.9 Atomic Transaction Type (ATYPE[7:0])

The ATYPE field indicates the type of the atomic transaction.

ATYPE[7:0]Meaning
0x00Atomic add
0x01Atomic and
0x02Atomic or
0x03Atomic xor
0x04Atomic max
0x05Atomic min
0x06Atomic maxu
0x07Atomic minu
0x08Atomic swap

3.3.10 Host ID (HOSTID[4:0])

The HOSTID field indicates the ID of the host making a transaction request. All transactions with the same ID value must remain in order.

3.3.11 User Field (U)

Message bit designated with a U are available for use by application and signal layer implementations. Any undefined user bits shall be set to zero.

3.3.12 Reserved Field (R)

Message bit designated with an R are reserved for future UMI enhancements and shall be set to zero.

3.4 Message Descriptions

3.4.1 INVALID

INVALID indicates an invalid message. A receiver can choose to ignore the message or to take corrective action.

3.4.2 REQ_RD

REQ_RD reads (2^SIZE)*(LEN+1) bytes from device address(DA). The device initiates a RESP_RD message to return data to the host source address (SA).

If at some point in the network REQ_RD is determined to be unroutable (for example, at a network boundary), RESP_RD should be sent back to the SA of the request with ERR=NETERR with no data (DATA=0 at the SUMI level, empty array at the TUMI level). All other fields in RESP_RD (SIZE, LEN, etc.) should match those in the request.

If REQ_RD cannot be executed by a device for any reason (including an unsupported SIZE), RESP_RD should be sent back to the SA of the request with ERR=DEVERR and no data; all other fields (SIZE, LEN, etc.) should match those in the request.

3.4.3 REQ_WR

REQ_WR writes (2^SIZE)*(LEN+1) bytes to destination address(DA). The device then initiates a RESP_WR acknowledgment message to the host source address (SA).

If REQ_WR cannot be transmitted past a certain point in the network due to a narrowing in the data bus width, RESP_WR should be sent back to the SA of the request with ERR=NETERR; all other fields (SIZE, LEN, etc.) should match those in the request. The same behavior applies when REQ_WR is unroutable.

If REQ_WR cannot be executed by a device for any reason (including an unsupported SIZE), RESP_WR should be sent back to the SA of the request with ERR=DEVERR; all other fields (SIZE, LEN, etc.) should match those in the request.

3.4.4 REQ_WRPOSTED

REQ_WRPOSTED performs a unidirectional posted-write of (2^SIZE)*(LEN+1) bytes to destination address (DA). There is no response message sent by the device back to the host.

If the destination address is reachable and SIZE is supported at the destination and the entire path leading to it, the REQ_WRPOSTED message is guaranteed to complete, otherwise it may fail silently. This means that REQ_WRPOSTED may be dropped silently if it cannot pass through part of the network due to data bus narrowing, if the transaction is determined to be unroutable at some point along its path (e.g., at a network boundary), or if the request is unsupported by a device.

3.4.5 REQ_RDMA

REQ_RDMA reads (2^SIZE)*(LEN+1) bytes of data from a primary device destination address (DA) along with a source address (SA). The primary device then initiates a REQ_WRPOSTED message to write (2^SIZE)*(LEN+1) data bytes to the address (SA) in a secondary device. REQ_RDMA requires the complete SA field for addressing and does not support pass through information for the UMI signal layer.

REQ_RDMA may be dropped silently if it is determined to be unroutable, or if the request is unsupported by the primary device.

3.4.6 REQ_ATOMIC{ADD,OR,XOR,MAX,MIN,MAXU,MINU,SWAP}

REQ_ATOMIC initiates an atomic read-modify-write memory operation of size (2^SIZE) at destination address (DA). The REQ_ATOMIC sequence involves:

  1. Host sending data (DATA), destination address (DA), and source address (SA) to the device,
  2. Device reading data address DA
  3. Applying a binary operator {ADD,OR,XOR,MAX,MIN,MAXU,MINU,SWAP} between D and the original device data
  4. Writing the result back to device address DA
  5. Returning the original device data to host address SA with a RESP_RD message.

If REQ_ATOMIC cannot be transmitted past a certain point in the network due to a narrowing in the data bus width, RESP_RD should be sent back to the SA of the request with ERR=NETERR and no data; all other fields (SIZE, LEN, etc.) should match those in the request. The same behavior applies when REQ_ATOMIC is unroutable.

If REQ_ATOMIC cannot be executed by a device for any reason (including an unsupported SIZE), RESP_RD should be sent back to the SA of the request with ERR=DEVERR and no data; all other fields (SIZE, LEN, etc.) should match those in the request.

3.4.7 REQ_ERROR

REQ_ERROR sends a unidirectional message to a device (ERR) to indicate that an error has occurred. The device can choose to ignore the message or to take action. There is no response message sent back to the host from the device.

3.4.8 REQ_LINK

RESP_LINK is a reserved CMD only message for link layer non-memory mapped actions such as credit updates, time stamps, and framing. CMD[31-8] are all available as user specified control bits. The message is local to the signal (physical) layer and does not include routing information and does not elicit a response from the receiver.

3.4.9 REQ_USER

REQ_USER message types are reserved for non-standardized custom UMI messages.

3.4.10 REQ_FUTURE

REQ_FUTURE message types are reserved for future UMI feature enhancements.

3.4.11 RESP_RD

RESP_RD returns (2^SIZE)*(LEN+1) bytes of data to the host source address (SA) specified by the REQ_RD message.

If RESP_RD cannot be transmitted past a certain point in the network due to a narrowing in the data bus width, then the transaction should be modified so that ERR=NETERR, and the DATA field should be dropped (DATA=0 at the SUMI level, empty array at the TUMI level). All other fields (SIZE, LEN, etc.) should be unmodified.

RESP_RD may be dropped silently in the network if it is determined to be unroutable.

3.4.12 RESP_WR

RESP_WR returns an acknowledgment to the original source address (SA) specified by the the REQ_WR transaction. The message does not include any DATA.

RESP_WR may be dropped silently in the network if it is determined to be unroutable.

3.4.13 RESP_LINK

RESP_LINK is a reserved CMD only transaction for link layer non-memory mapped actions such as credit updates, time stamps, and framing. CMD[31-8] are all available as user specified control bits. The transaction is local to the signal (physical) layer and does not include routing information.

3.4.14 RESP_USER

RESP_USER message types are reserved for non-standardized custom UMI messages.

3.4.15 RESP_FUTURE

RESP_FUTURE message types are reserved for future UMI feature enhancements.


4. Signal UMI Layer (SUMI)

4.1 Theory of Operation

The UMI signal layer (SUMI) defines the mapping of UMI transactions to a point-to-point, latency insensitive, parallel, synchronous interface with a valid ready handshake protocol.

UMI

The SUMI signaling layer defines a subset of TUMI information to be transmitted as an atomic packet. The follow table documents the legal set of SUMI packet parameters .

FieldWidth (bits)
CMD32
DA32, 64
SA32, 64
DATA64,128,256,512,1024

The following example illustrates a complete request-response transaction between a host and a device.

UMIX7

UMI messages can be split into multiple atomic SUMI packets as long as message ordering and byte ordering is preserved. A SUMI packet is a complete routable mini-message comprised of a CMD, DA, SA, and DATA field, with DA and SA fields updated to reflect the correct byte addresses of the DATA payload. The end of message (EOM) bit indicates the arrival of the last packet in a message.

The following examples illlustrate splitting of UMI read and write messages into shorter SUMI packets.

TUMI read example:

Potential SUMI packet sequence:

TUMI write example:

Potential SUMI packet sequence:

Note that SA and DA increment in the sequence of transactions resulting from a split request. In a split response, only DA increments in the resulting transactions, because responses don't have the SA field. Please be aware of this incrementing behavior when storing user information in SA or DA, since incrementing could modify that information. Formally, bit n in an address is safe from modification if the original outbound transaction satisfies:

A[n-1:0] + (2^SIZE)*(LEN+1) < 2^n

If A[n-1:0]=0, this reduces to the requirement that the number of bytes in the transaction is less than 2^n. As a simple example, consider A[1:0]=0b00, SIZE=0. Bit A[2] is safe from modification if LEN=0, 1, or 2 but not if LEN=3. If A[1:0] is instead 0b10, bit A[2] is only safe when LEN=0.

4.1.1 Splitting Rules

Generalizing from the example above, this section describes the formal rules for splitting a SUMI packet.

Definitions:

  1. The number of split outputs is denoted N.
  2. A field of the ith split output is referred to as FIELD_out[i], with 0<=i<=N-1.
  3. The notation FIELD_out[p:q] means the values FIELD_out[p] through (inclusive) FIELD_out[q].
  4. The notation FIELD_in means the value of FIELD in the SUMI packet being split.

Rules:

  1. Splitting is allowed only for REQ_RD, REQ_WR, REQ_WRPOSTED, REQ_RDMA, RESP_RD, RESP_WR, when EX=0.
  2. Copy HOSTID, ERR, EOF, PROT, QOS, SIZE, OPCODE, and any USER or RESERVED fields into each split output.
  3. LEN_out[i] may be different for each split output as long as sum(LEN_out[0:N-1])+N == LEN_in+1.
  4. DA_out[i] := DA_out[i-1] + (2^SIZE)*(LEN_out[i-1]+1), 1<=i<=(N-1). DA_out[0] := DA_in.
  5. SA_out[i] := SA_out[i-1] + (2^SIZE)*(LEN_out[i-1]+1), 1<=i<=(N-1). SA_out[0] := SA_in. Applies only to split requests, because responses do not have the SA field.
  6. EOM_out[i] := EOM_in & (i == (N-1)).

4.1.2 Merging Rules

Merging, the inverse of splitting, is also permitted for related SUMI packets. This may be done to improve packet transmission performance by reducing network bandwidth required. This may also improve host or device performance: for example, a device may be able to deal with related requests more efficiently if they have been merged together into a single SUMI packet. Similarly, a host may be able to process merged responses more effectively. This section describes the formal rules for merging SUMI packets.

Definitions:

  1. The number of merge inputs is denoted N.
  2. A field of the ith merge input is referred to as FIELD_in[i], with 0<=i<=N-1.
  3. The notation FIELD_in[p:q] means the values FIELD_in[p] through (inclusive) FIELD_in[q]
  4. The notation FIELD_out means the value of FIELD in the output of a SUMI packet merge.

Rules:

  1. Merging is allowed only for REQ_RD, REQ_WR, REQ_WRPOSTED, REQ_RDMA, RESP_RD, RESP_WR, when EX=0.
  2. HOSTID, ERR, EOF, PROT, QOS, SIZE, OPCODE, and any USER or RESERVED fields must match in all merge inputs. These values are copied into the merge output.
  3. EOM_in[i] must be 0 for 0<=i<=(N-2), that is, it must be zero for all but the last merge input. EOM_in[N-1] may be either 0 or 1.
  4. DA_in[i] must be equal to DA_in[i-1] + (2^SIZE)*(LEN_in[i-1]+1), 1<=i<=(N-1).
  5. DA_out := DA_in[0].
  6. SA_in[i] must be equal to SA_in[i-1] + (2^SIZE)*(LEN_in[i-1]+1), 1<=i<=(N-1). Applies only to merged requests.
  7. SA_out := SA_in[0]. Applies only to merged requests.
  8. LEN_out := sum(LEN_in[0:N-1])+N-1.
  9. EOM_out := EOM_in[N-1].

4.2 Handshake Protocol

SUMI adheres to the following ready/valid handshake protocol:

UMI

  1. A transaction occurs on every rising clock edge in which READY and VALID are both asserted.
  2. Once VALID is asserted, it must not be de-asserted until a transaction completes.
  3. READY, on the other hand, may be de-asserted before a transaction completes.
  4. The assertion of VALID must not depend on the assertion of READY. In other words, it is not legal for the VALID assertion to wait for the READY assertion.
  5. However, it is legal for the READY assertion to be dependent on the VALID assertion (as long as this dependence is not combinational).

The following examples help illustrate the handhsake protocol.

LEGAL: VALID asserted before READY

UMIX1

LEGAL: READY asserted before VALID

UMIX2

LEGAL: READY and VALID asserted simultaneously

UMIX3

LEGAL: READY toggles with no effect

UMIX4

LEGAL: VALID asserted for multiple cycles (multiple transactions)

UMIX6

ILLEGAL: VALID de-asserted without waiting for READY

UMIX5

4.3 Verilog Standard Interfaces

4.3.1 Host Interface

output          uhost_req_valid;
input           uhost_req_ready;
output [CW-1:0] uhost_req_cmd;
output [AW-1:0] uhost_req_dstaddr;
output [AW-1:0] uhost_req_srcaddr;
output [DW-1:0] uhost_req_data;

input           uhost_resp_valid;
output          uhost_resp_ready;
input [CW-1:0]  uhost_resp_cmd;
input [AW-1:0]  uhost_resp_dstaddr;
input [AW-1:0]  uhost_resp_srcaddr;
input [DW-1:0]  uhost_resp_data;

4.3.1 Device Interface

input           udev_req_valid;
output          udev_req_ready;
input [CW-1:0]  udev_req_cmd;
input [AW-1:0]  udev_req_dstaddr;
input [AW-1:0]  udev_req_srcaddr;
input [DW-1:0]  udev_req_data;

output          udev_resp_valid;
input           udev_resp_ready;
output [CW-1:0] udev_resp_cmd;
output [AW-1:0] udev_resp_dstaddr;
output [AW-1:0] udev_resp_srcaddr;
output [DW-1:0] udev_resp_data;

5. UMI Link Layer (LUMI)

UMI link layer interface converts the parallel SUMI interface into packetized, framed interface. The packets over LUMI will be sent by sending cmd, dstaddr, srcaddr and data on the same lines.

5.1 Signals

The following table provides the LUMI interface signals presented from a device side perspective. All signals are single ended and unidirectional. All unidirectional signals must be deterministically driven at all times.

SIGNALDRIVERDESCRIPTION
nresethostAsynchronous active low reset
clkhostLUMI clock
rxctrl[3:0]hostRX link control signals(eg. valid,..)
rxstatus[3:0]deviceRX link status signals(optional)
rxdata[N-1:0]hostRX link data signals
txctrl[3:0]deviceTX link control signals(eg. valid,..)
txstatus[3:0]hostTX link status signals(optional)
txdata[N-1:0]deviceTX link data signals

LUMI supports data width of 8, 16, 32, 64 and 128 bits.

The following diagram show how a host and device is connected over LUMI.

Host-Device Connection Diagram. Note that the RX of the device is connected
to the TX of the host (and vice versa).

5.2 Signal Description

nreset

Asynchronous active low reset. To prevent power up and initialization issues the device 'nreset' pin must be sampled by a synchronizer with asynchronous assert and synchronous deassert logic. REF

clk

Data link clock driven by host.

txctrl[0]/rxctrl[0]

Valid signal for the Rx (host -> device) or Tx (device -> host) packet. A HIGH value indicates valid data and valid data is transmitted on every cycle with valid high. Unlike UMI SUMI layer LUMI does not require a ready signal in order to transmit data. The interface uses credit flow control as described in section 5.4 below. This signal is mandatory in all implementations.

txctrl[1]/rxctrl[1]

Optional signal indicating burst traffic. When high this signal indicates that the current packet is continuous to the previous one and therefore does not carry the header. It can only be asserted when the packet is continuous to the previous one and has the same SUMI header.

txctrl[2]/rxctrl[2]

Optional forward error correction (fec) signal to handle soft errors in rxdata.

txctrl[3]/rxctrl[3]

Optional redundancy "aux" signal to handle manufacturing errors or persistent in the field error of one of the rxdata pins.

txstat[3:0]/rxstat[3:0]

Optional status indications.

txdata[N-1:0]/rxdata[N-1:0]

LUMI egress/ingress data bus, active high. Supports 8b, 16b, and 64b modes. The data width is identical between the host and device and needs to be negotiated before the link can be used.

5.3 Packet format

The LUMI standard requires the host to fully support UMI protocol.

LUMI packet format follows the UMI one and serializes the UMI cmd, dstaddr, srcaddr and data fields into one serial bit stream.

[511:0][63:0][63:0][31:0]
datasrcaddrdstaddrcmd

LUMI packets are transmitted over the Tx/Rx pins with reduces interface size and are sent LSB first. The following example shows packet transmission over 64b interface:

Cycle63:3231:0
1A[31:0]C[31:0]
2S[31:0]A[63:32]
3D[31:0]S[63:32]
4D[95:64]D[63:32]
...
11NAD[511:480]

The following features are implemented in order to optimize the link efficiency:

5.4 Flow control

LUMI is using credit based flow control. The credit init/update messages will be sent over the link using LUMI link-layer commands and are controlled by the receiver side. The transmitter side of each link is responsible for not exceeding published credits. If the transmitter does exceed published credits, subsequent behavior of the receiver is undefined. Credit update messages are using command only in order to reduce the overhead.

Credit init/update messages will be sent using link-layer UMI command:

Message[31:16] data[15:12] addr[11:8] LNK CMD[7:0] UMI CMD
InvalidNANA0x0 invalidlink layer CMD
credit init#credit0x0 - req credit<br/>0x1 - resp credit0x1 credit initlink layer CMD
credit update#credit0x0 - req credit<br/>0x1 - resp credit0x2 credit updatelink layer CMD

The credit are in LUMI data width units. One credit represents a single data cycle with valid high.

5.5 Credit/link initialization

After reset both sides of the link wake up in non-active state and can only accept credit-init transactions. Once a credit init message is received the transmitter may start sending packets up to the provided credit.

5.6 Physical layer mapping

UMI link layer can be transported over several physical layer options. The following options are supported and their mapping outlined below:

Appendix A: UMI Transaction Translation

A.1 RISC-V

UMI transactions map naturally to RISC-V load store instructions. Extra information fields not provided by the RISC-V ISA (such as as QOS and PRIV) would need to be hard-coded or driven from CSRs.

RISC-V InstructionDATASADACMD
LD RD, offset(RS1)--addr(RD)RS1REQ_RD
SD RD, offset(RS1)RDaddr(RD)RS1REQ_WR
AMOADD.D rd,rs2,(rs1)RDaddr(RD)RS1REQ_ATOMADD

The address(RD)refers to the ID or source address associated with the RD register in a RISC-V CPU. In a bus based architecture, this would generally be the host-id of the CPU.

A.2 TileLink

A.2.1 TileLink Overview

TileLink [REF 1] is a chip-scale interconnect standard providing multiple masters (host) with coherent memory-mapped access to memory and other slave (device) devices.

Summary:

A.1.1 TileLink <-> UMI Mapping

This section outlines the recommended mapping between UMI transaction and the TileLink messages. Here, we only explore mapping TL/UH TileLink modes with UMI 64bit addressing and UMI bit mask support up to 128 bits.

SymbolMeaningTileLink Name
CData is corrupt{a,b,c,d,e}_corrupt
BMASKMask (2^SIZE)/8 (strobe){a,b,c,d,e}_mask
HOSTIDSource ID{a,b,c,d,e}_source

The following table shows the mapping between TileLink and UMI transactions, with TL-UL and TL-UH TileLink support. TL-C conformance is left for future development.

TileLink MessageUMI TransactionCMD[26:25]
GetREQ_RD0b00
AccessAckDataRESP_WR--
PutFullDataREQ_WR0bC0
PutPartialDataREQ_WR0bC0
AccessAckRESP_WR--
ArithmaticDataREQ_ATOMIC0b00
LogicalDataREQ_ATOMIC0bC0
IntentREQ_USER00b00
HintAckRESP_USER0--

The TileLink has a single long N bit wide 'size' field, enabling 2^N to transfers per message. This is in contrast to UMI which has two fields: a SIZE field to indicate word size and a LEN field to indicate the number of words to be transferred. The number of bytes transferred by a UMI transaction is (2^SIZE)*(LEN+1).

The pseudo code below demonstrates one way of translating from the TileLink size and the UMI SIZE/LEN fields.

if (tilelink_size<8){
   SIZE = tilelink_size;
   LEN = 0;
} else {
   SIZE = 7;
   LEN  = 2^(tilelink_size-8+1)-1
}

The TileLink master id and masking signals are mapped to the UMI SA field as shown in the table below.

SA63:5655:4847:4039:3231:2423:1615:87:0
64b modeRRRUUUBMASKBMASK

The TileLink atomic operations encoded in the param field map to the UMI ATYPE field as follows.

TileLink paramUMI ATYPE
MIN (0)ATOMICMIN
MAX (1)ATOMICMAX
MINU (2)ATOMICMINU
MAXU (3)ATOMICMAXU
XOR(0)ATOMICXOR
OR (1)ATOMICOR
AND (2)ATOMICAND
SWAP (3)ATOMICSWAP

A.2 AXI4

A.2.1 AXI4 Overview

AXI is a transaction based memory access protocol with five independent channels:

Constraints:

A.2.2 AXI4 <-> UMI Mapping

The table below maps AXI terminology to UMI terminology.

AXIUMI
ManagerHost
SubordinateDevice
TransactionTransaction

The table below shows the mapping between the five AXI channels and UMI messages.

AXI ChannelUMI Message
Write requestREQ_WR
Write dataREQ_WR
Write responseRESP_WR
Read requestREQ_RD
Read dataRESP_RD

The AXI LEN, SIZE, ADDR, DATA, QOS, PROT[1:0], HOSTID, LOCK fields map directly to equivalent UMI CMD fields. See the tables below for mapping of other AXI signals to the SA fields:

SA63:5655:4847:4039:3231:2423:1615:87:0
64b modeRRRUU,REGIONU,CACHE,BURSTSTRBSTRB
32b mode--------RU,CACHE,BURSTSTRBSTRB

Restrictions:

A.3 AXI Stream

A.3.1 AXI Stream Overview

AXI-Stream is a point-to-point protocol, connecting a single Transmitter and a single Receiver.

A.3.2 AXI Stream <-> UMI Mapping

The mapping between AXI stream and UMI is shown int he following tables.

AXISUMI signal
tvalidvalid
treadyready
tdataDATA
tlastEOF
tidHOSTID
tuserSA
tkeepSA
tstrbSA
twakeupSA
SA63:5655:4847:4039:3231:2423:1615:87:0
64b modeUU,TWAKEUPTUSERTDESTTKEEPTKEEPTSTRBTSTRB
32b mode--------TKEEPTKEEPTSTRBTSTRB

Restrictions:

Appendix B: LUMI mapping to physical layer

The following examples are provided as reference for mapping LUMI over BoW, AIB and UCIe.

B.1 Bunch of Wires mapping

LUMI over BoW will use BoW physical layer only. BoW physical layer does not have any framing to the data and therefore requires sending LUMI valid signal over a data lane. The signal mapping is the following:

BoW signalCLINK signalDescription
TX Datatxdata + txvldData to transmit over BoW
RX Datarxdata + rxvldData received over BoW
Core clkclk[0]CLINK clock to be used as BoW clock

Other, optional, signals like FEC and AUX will not be used by LUMI.

B.2 AIB mapping

AIB uses a simple, no framing data structure. When transporting LUMI over AIB the LUMI interface will connect to the AIB MAC interface. The signal mapping for AIB MAC is the following:

AIB signalCLINK signalDescription
data_outtxdataData to transmit over AIB
data_inrxdataData received over AIB
m_ns_fwd_clkclk[0]CLINK clock to be used as AIB clock
m_fw_fwd_clk------CLINK does not use Rx clock
ns_mac_rdytxctrl[0]Valid signal for TX data
fs_mac_rdyrxctrl[0]Valid signal for RX data

Other optional AIB Plus signals are not required for LUMI-AIB connection and will not be used.

B.3 UCIe mapping

LUMI over UCIe will use UCIe Raw Die-to-Die interface (RDI). The signal mapping for RDI is the following:

UCIe signalCLINK signalDescription
lclkclk[0]clock
lp_irdytxctrl[0]data ready signal - same as valid
lp_validtxctrl[0]data valid indication
lp_datatxdatadata to be transmitted
lp_retimer_crd------Not used (for retimer only)
pl_trdy------Not used (FC handled at CLINK level)
pl_validrxctrl[0]data valid from phy
pl_datarxdatadata from phy
pl_retimer_crd------Not used (for retimer only)

UCIe also requires implementing other phy control logic to maintain the link. The following signals will be handled by the UCIe<->CLINK bridge and not exposed to the CLINK. They should handled and set before the link is declaered active.


References

[1] TileLink Specification (version 1.7)

[2] AMBA4 AXI Protocol Specification (22 February 2013, Version E)

[3] AMBA4 AXI Stream Protocol Specification (09 April 2021, Version A)

[4] AMBA4 APB Protocol Specification (13 April 2010, Version C)

License

Apache License 2.0

Contributing

UMI is an open-source project and welcomes contributions. To find out how to contribute to the project, see our Contributing Guidelines.

Issues / Bugs

We use GitHub Issues for tracking requests and bugs.