Home

Awesome

u[Dark]RISC -- micro-DarkRISC

This is an early 16-bit RISC processor designed years before DarkRISCV. Although never intendend for real use (it was derived from a serie of experimental cores for a presentation in a University), it is very simple and easy to understand.

Features

Designed back in 2015 (years before DarkRISCV), it includes:

History & Motivation

The beginning: 16-bit VLIW DSPs on FPGAs, with high optimized ALUs around DSP blocks (18x18 mul w/ 48-bit accumulators), in a way that multiple MAC operations can be optimized in parallel w/ other operations (load, branch, etc). However, conventional code was hard to port for VLIW DSPs, so a more general purpose processor was needed...

Lots of different concepts around accumulator-oriented, register bank oriented, parallel data/address, VLIW, SIMD, MIMD, vector, 16/24/32/48-bits, etc most designs were identified just as “core” or “dsp”, but this specific concept was named uRISC for an external presentation on a University. Because there are too much processors called uRISC already, it was renamed to uDarkRISC (micro-DarkRISC).

Designed as a evaluation processor, it was never designed to be used on real products and never tested with complex applications... however, the more conventional approach was used as base for DarkRISCV, a high-performance processor which implements a RV32I/E 100% compatible with GCC compiler! For more information about DarkRISCV, please check: https://github.com/darklife/darkriscv/tree/master

Instruction Decode

The 16bit instruction word is read from the synchronous BRAM and divided in 3 or 4 fields, depending on the instruction type:

Alternatively, the field 7:0 can be used to load a signed extended 8-bit constant.

Instruction Execution

According to the Destination Register, Source Register, Option Data or Immediate Data, all 16 possible instructions are computed in parallel and put in a 16x16-bit array:

Note that there is no conditional branches other than the LOP instruction.

As far as the instructions are read from the ROM, the opcode is used to index the above array, so the result is written directly to the register bank. Also, the PC is computed, in a way that it can be 0 (RES==0), DREG (RET instruction), PC+IMM (BSR, BRA or LOP instructions) or just PC+1.

Finnaly, the LOD and STO activate the RD and WR signals for load/store, with DATA active w/ DREG on store or tri-state on load and SREG as pointer in both cases (so, no addressing modes).

Pipeline Detail

The micro-DarkRISC is a high performacne processor that needs a lot of bandwidth on the instructon bus: running at 75MHz, it requires 150MB/s continuously! So the pipeline is optimized in a way that the pipeline is always filled, even in the case of a branch: there is no flush pipeline, so the pre-fetched instruction is executed and the core peaks IPC = 1 all time.

Since the load/store instructions are not used so often, there is no optimization regarding the load/store. The impact of such decision depends on the application: for applications that are handled only with internal registe, there is no impact. For most case, however, a small LUTRAM may be enough and, again there is no impact.

Eventually, in the case of BRAM, wait-states are required, but the core does not foresee a HALT signal, so there is no way to insert wait-states. Instead, it is possible simulate such wait-states by sucessive load/store instructions at the same address, so the memory is read multiple times until it is ready.

Instruction Set

The direct supported instructions are:

In addition, there is an Advanced Instruction Set (aka “pseudo-instructions”).

The known pseudo-instructions are:

Since the design was before my contact with RISC-V technology, some important concepts are just missing and the set is mostly 68k oriented: ADDQ, SUBQ, BSR, BRA, etc. Even LOP is basically a RISC version of DBcc, which is pretty useful for DSP applications, since LOP supports delayed branch. Although there is no accumulator, the theory around a large accumulator is compute sucessive MUL + ADD, in a way that the values are shifted left. Instead, on this core, the MUL result is shifted right, so it can be accumulated in a separate step, resulting in values that fits on 16-bit only and all LSBs are lost.

Development System

Limited to an AWK-based assembler that generates a Verilog file w/ the ROM description and waveform inspection of the simulation... I like show the AWK-based assembler because I developed lots and lots of assemblers and converters across my time working on Siemens, both for well-known archs (such as convert 68000 asm to coldfire asm) and obscure and fully proprietary archs (such as multi-way VLIW DSPs), as well all kind of code/data to verilog converters.

Conclusion

Although the uDarkRISC was frozen in time and replaced by the DarkRISCV, there are always people trying find a simple processor to study, which means that there is no bad processors, just processors that are better for different applications. So, although DarkRISCV is good because it can actually run complex code generated by GCC, the uDarkRISC may be far better for starters that are just trying understand how design a simple processor.