PS1’s 2023 FPGA 101

Welcome to our class website. The class happened on 2023/05/27 and went pretty well.

We still have a couple things to fix on that website and the various tools after the feedback from the class:

Timing explanations, what happens and when
Ring buffer, some were confused, we probably need an animation or a drawing
Setting up the lessons, write a file? where?
Environments to activate, why an environment
Different mac addresses and ips. Make a step that tell them exactly what to do. Don’t put the same mac on the same network
ECPDAP not working with –load (khz vs hz in programmer.py of litex) - PR sent https://github.com/enjoy-digital/litex/pull/1699

You will find all the code, and the source of the website at: https://github.com/bjonnh/fpga_class_psone/

There is a menu on the left, on a phone you may have to click on the three lines icon.

You’ll probably want to start with the Introduction

The Candy jar contains more detailled pages about the subjects discussed in the class.

There are checkmarks next to the pages you visited. You can clear them on the bottom left of the page.

You can change the theme of the website on the bottom left next to the brush. It is not guaranteed to look right if you use a light theme.

The presentation is HERE.

Introduction

Aims of the class

Introduce the basics of FPGA programming using Verilog on an Open Source and Free toolchain. Boards used will be the readily available Colorlite i5, which use the Lattice ECP5 FPGA. We will end with examples that utilize LiteX, which can be thought of as “IP cores” (but with more Python) and allow things like HDMI, ethernet, and even entire CPU cores to be synthesized in the FPGA.

You will not be an expert in FPGAs but you should be able to create simple programs and build more complex things over time. This is not a class about learning how to use a computer or learning the deep intricacies of Verilog and LUTs. This is a practical class where you get your hands dirty trying to make your board do cool things. And we will sparkle a little bit of theory so you get a sense of what is happening.

Organization

The class will happen on May, 27th 2023 at PS1 and start promptly at 11AM, you can come a bit earlier if you think you need more time to setup.

Attention

You need to come with a computer already setup with all the tools. See the Requirements

This is the program of the day, this will obviously depends greatly on what you want to do, how fast you advance etc:

journey title FPGA Class-Workshop 11AM Presentation of the participants and the class: 0 11.5AM What are FPGAs: 1 12AM First lessons: 2 01PM BYO Food break: 3 01.5PM Last cycle of the lessons: 5 03PM Hack your own stuff: 7 05PM Show your stuff: 9

So we will have a tiny formal introduction but we will jump really quickly into the practical work directly on your boards and you will learn things as you do.

The second part of the day will be about you hacking things together with the electronic components we have in the electronics area or the ones you or others did bring.

And the last part will be about you showing the group what you’ve achieved.

Disclaimer

We are not responsible for any damage, anywhere and on anything. You are connecting electronic devices to your computer and this comes with risks that you need to evaluate for yourself. Make sure you have backups and that you don’t use a computer that you can’t replace.

You are also going to install quite a lot of software from various places, so make sure you are using appropriate protections against the threats you fear the most.

Requirements and setup

You need a computer with Wifi, an available USB port and you need to be admin on that computer.

You will have to install things a couple days BEFORE the class so we can help you solve issues.

Computer setup

You need a few things installed on your computer BEFORE THE CLASS:

Git
OSS-CAD and Yosys (you have instructions on the next pages for Linux, MacOS and Windows)
A text editor you love and trust and are comfortable with
Either download the zip of the course files on Github or run git clone https://github.com/bjonnh/fpga_class_psone/ (re-download it or update with git pull the day of the class)

You need to be able to connect to a Wifi network and install software from the internet (so no parental lock or other insanities).

Make sure you don’t try it for the first time on the day of the class, you will be unhappy and we will be as well.

Boards

You will receive three boards (the first two are stacked together):

A colorlight i5, the FPGA board itself on a SODIMM support
A breakout/carrier board which holds the SODIMM and in turn provides HDMI, a USB programmer and tons of IOs on PMOD-like connectors
A double ethernet PMOD that will provide connectivity (we will connect that to PS1’s network)

All these items will be given to you at the beginning of the session or before if you want to start playing with it (don’t break them!)

Linux setup (recommended)

Computer setup

You need a few things installed on your computer BEFORE THE CLASS:

OSS-CAD
Yosys
A text editor you love and trust and are comfortable with

And you need to be able to setup your own networks (so no work locked computers) and install software from the internet (so no parental lock or other insanities).

Make sure you don’t try it for the first time on the day of the class, you will be unhappy and we will be as well.

Scripted installation on Debian/Ubuntu

wget -N  https://raw.githubusercontent.com/bjonnh/fpga_class_psone/main/setup_linux_admin.sh
chmod +x ./setup_linux_admin.sh
sudo ./setup_linux_admin.sh
# You will have to logout and login to get the new group pemissions.
wget -N  https://raw.githubusercontent.com/bjonnh/fpga_class_psone/main/setup_linux.sh
chmod +x ./setup_linux.sh
./setup_linux.sh

(Obviously you need curl installed and your user will need to be able to sudo)

Fast installation on other distros

You can follow most of the commands from: https://raw.githubusercontent.com/bjonnh/fpga_class_psone/main/setup_linux.sh

Just replace apt/apt-get by what you use.

Linux details (useful only if you have issues or another distro)

Attention

You need to have cURL, git and other build tools installed. And you need to be a member of the dialout group to be able to access the programmer serial port. On ubuntu, those with:

sudo apt-get install build-essential curl git libhidapi-hidraw0
sudo usermod -a -G dialout,plugdev $USER
sudo curl -o /etc/udev/rules.d/99-openfpgaloader.rules https://raw.githubusercontent.com/trabucayre/openFPGALoader/master/99-openfpgaloader.rules
sudo udevadm control --reload-rules && sudo udevadm trigger # force udev to take new rule
# You will have to logout and login again

Go to https://github.com/YosysHQ/oss-cad-suite-build/releases/tag/2023-05-05 and get the one for your platform, put it in /tmp

For example on x64, assuming you want to install in your home directory, it will decompress in ~/oss-cad-suite

This is assuming you will run all those commands in the same terminal.

# You can customize those, but you are responsible to change them when necessary
export DOWNLOADS_PATH=$HOME/Downloads
export INSTALL_PATH=$HOME

mkdir -p "$DOWNLOADS_PATH"
curl -L -o"$DOWNLOADS_PATH"/oss-cad-suite.tgz https://github.com/YosysHQ/oss-cad-suite-build/releases/download/2023-05-20/oss-cad-suite-linux-x64-20230520.tgz
tar -xzf "$DOWNLOADS_PATH"/oss-cad-suite.tgz --directory "$INSTALL_PATH"
cd "$INSTALL_PATH"/oss-cad-suite
source environment

Attention

Currently, the Python distributed with oss-cad doesn’t have the right version of pip and remove the installed migen reference that conflicts with the one we will use for LiteX

MAKE SURE YOU RUN THE FOLLOWING COMMANDS, they work well on ubuntu, it will work ok on RedHat-like or Arch, but you will have to install the riscv toolchain yourself.

tabbypy3 -m pip install --upgrade pip
rm -rf lib/python3.*/site-packages/migen.egg-link

It will display something like:

Requirement already satisfied: pip in /home/you/Software/fpga/oss-cad-suite/lib/python3.8/site-packages (23.1.1)
Collecting pip
  Downloading pip-23.1.2-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 30.4 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.1
    Uninstalling pip-23.1.1:
      Successfully uninstalled pip-23.1.1
Successfully installed pip-23.1.2

Install LiteX

This is really long, please make sure you do that BEFORE the day of the class.

mkdir -p litex
cd litex
curl -olitex_setup.py https://raw.githubusercontent.com/enjoy-digital/litex/master/litex_setup.py
tabbypy3 litex_setup.py --init --install
sudo python3 litex_setup.py --gcc=riscv
tabbypy3 -m pip install meson ninja

Testing it

cd "$INSTALL_PATH"/oss-cad-suite
mkdir -p projects
cd projects
git clone https://github.com/bjonnh/alscope
cd alscope
tabbypy3 ./main.py --ip-address=10.0.0.42 --build

If this failed you may not have enough memory to build FPGA programs… Or something failed during the installation

Usage

Now, anytime you need to use OSS-CAD:

export INSTALL_PATH=$HOME
source "$INSTALL_PATH"/oss-cad-suite/environment

This is all you need, any time you want to do something based on LiteX remember you have to use tabbypy3 and not python3 so you use the version that is packaged with oss-cad not the one from your OS.

MacOS setup

MacOS

Attention

You need to have a few tools installed. We are giving instructions with brew, but if you know or use anything else, feel free to send us the commands to use. It looks like that just installing brew installs all the tools necessary.

Go to https://github.com/YosysHQ/oss-cad-suite-build/releases/tag/2023-05-05 and get the one for your platform, put it in /tmp

For example on x64, assuming you want to install in your home directory, it will decompress in ~/oss-cad-suite (for M1 replace x64 by arm64) This is assumed you will run all those commands in the same terminal.

# You can customize those, but you are responsible to change them when necessary
export DOWNLOADS_PATH=$HOME/Downloads
export INSTALL_PATH=$HOME
export ARCH=x64

mkdir -p "$DOWNLOADS_PATH"
curl -L -o"$DOWNLOADS_PATH"/oss-cad-suite.tgz https://github.com/YosysHQ/oss-cad-suite-build/releases/download/2023-05-05/oss-cad-suite-darwin-$ARCH-20230505.tgz
tar -xzf "$DOWNLOADS_PATH"/oss-cad-suite.tgz --directory "$INSTALL_PATH"
cd "$INSTALL_PATH"/oss-cad-suite
source environment

Attention

Currently, the Python distributed with oss-cad doesn’t have the right version of pip and remove the installed migen reference that conflicts with the one we will use for LiteX

MAKE SURE YOU RUN THE FOLLOWING COMMANDS

tabbypy3 -m pip install --upgrade pip
rm -rf lib/python3.*/site-packages/migen.egg-link

It will display something like:

Requirement already satisfied: pip in /home/you/Software/fpga/oss-cad-suite/lib/python3.8/site-packages (23.1.1)
Collecting pip
  Downloading pip-23.1.2-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 30.4 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.1
    Uninstalling pip-23.1.1:
      Successfully uninstalled pip-23.1.1
Successfully installed pip-23.1.2

Install LiteX

This is really long, please make sure you do that BEFORE the day of the class.

mkdir -p litex
cd litex
curl -olitex_setup.py https://raw.githubusercontent.com/enjoy-digital/litex/master/litex_setup.py
python3 litex_setup.py --init --install
cd ..

Unfortunately we don’t have instructions yet on how to setup the Risc V toolchain.

Testing it

cd "$INSTALL_PATH"/oss-cad-suite
mkdir -p projects
cd projects
git clone https://github.com/bjonnh/alscope
cd alscope
tabbypy3 ./main.py --ip-address=10.0.0.42 --build

If this failed you may not have enough memory to build FPGA programs… Or something failed during the installation

Usage

Now, anytime you need to use OSS-CAD:

export INSTALL_PATH=$HOME
source "$INSTALL_PATH"/oss-cad-suite/environment

Windows setup

Computer setup

You need a few things installed on your computer BEFORE THE CLASS:

OSS-CAD
Yosys
A text editor you love and trust and are comfortable with

And you need to be able to setup your own networks (so no work locked computers) and install software from the internet (so no parental lock or other insanities).

Make sure you don’t try it for the first time on the day of the class, you will be unhappy and we will be as well.

Install on Windows

(This has only been tried on Windows 11 so far)

Windows doesn’t have everything

Certain tools like VHDL synthesis with GHDL only work on Linux and Mac, making you virtually only able to use Verilog or LiteX on windows unless you use WSL2 or a VM. Making USB Jtag devices can also be somewhat tricky.

You need to install Git: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git (Looks like setting everything to the default except perhaps the editor is your best bet here).

Go to https://github.com/YosysHQ/oss-cad-suite-build/releases/tag/2023-05-05

And download the exe for Windows.

Run it. It will decompress … somewhere… And you can move it where you want it. Once you are happy with that, open a powershell. Go to that directory and type

./start.bat
python3 -m pip install --upgrade pip
del .\lib\python3.8\site-packages\migen.egg-link
mkdir litex
cd litex
curl -olitex_setup.py https://raw.githubusercontent.com/enjoy-digital/litex/master/litex_setup.py # This is a single line starting with curl
python3 litex_setup.py --init --install
cd ..
mkdir projects
cd projects
git clone https://github.com/bjonnh/alscope
cd alscope
python3 ./main.py --ip-address=10.0.0.42 --build

JTAG programmer drivers

Windows is dumb with regard to drivers. You need to use a WinUSB driver for JTAG programmers (at least that’s the case with my DirtyJtag).

Follow the instructions at: https://learn.microsoft.com/en-us/windows-hardware/drivers/usbcon/winusb-installation

RISC v compiler suite

For this the easiest solution I found is to use MSYS2

and run inside msys2

pacman -S mingw-w64-x86_64-riscv64-unknown-elf-gcc ninja meson make

Then in your powershell

set PATH=%PATH%;C:\msys64\mingw64\bin;C:\msys64\usr\bin

But then I couldn’t get it to compile the board properly. So really you shouldn’t be using Windows… But we can still give you the files. Or you can help us fix that.

VM using VirtualBox

What we highly recommend for Windows and Mac users is to install a Ubuntu virtual machine, follow the Linux instructions and share the USB device with the VM.

Go to the settings of the VM, Ports, USB and add the JTAG device called NXP ARM mbed (0d28:0204) or something like that.

Lessons

If you are here, you probably want to start with either the Introduction that will give you info about the class or Lesson 1

01 - Blink

Blinky is the “Hello world” of FPGAs

The code is in code/lesson_01

This is probably the simplest code you can use on an FPGA that will generate something you can see working.

How code is organized

Verilog

We have a verilog file. Verilog is a hardware description language that allows us to describe how we want the FPGA to react to specific logic signals.

module top(input clk_i, output led_o);
   reg  led_reg;
   wire baseclk;

   clkdiv #(.DIV(200000)) slowclk (clk_i, baseclk);

   always @(posedge baseclk) begin;
      led_reg <= ~led_reg;
   end

   assign led_o = led_reg;
endmodule


module clkdiv #(parameter DIV = 24'd5000)(
    input wire clk_i,
    output wire clk_o
    );

    reg [24:0] count = 25'b0;
    reg clk_o_internal = 1;
    //on this board we have a 25MHz clock

    always @(posedge clk_i) begin
        count <= count + 25'b1;
        if(count == DIV) begin
            count <= 25'b0;
            clk_o_internal <= ~clk_o_internal;
        end
    end
    assign clk_o = clk_o_internal;
endmodule

It looks scary but let’s try to decompose it

graph LR; subgraph FPGA clk_i; led_o; end subgraph top clk_i-->SOMETHING; SOMETHING-->led_o; end

We have some magic logic that takes a clk_i, transforms it and sends that to led_o.

LPF file

Now we also need to tell whatever will interpret that verilog, this is what clk_in and led_o are. That’s the role of the LPF file (Lattice Preference File), this file will link your human name to specific pins or internal lines inside the chip.

LOCATE COMP "clk_i" SITE "P3";
IOBUF PORT "clk_i" IO_TYPE=LVCMOS33;
FREQUENCY PORT "clk_i" 25 MHZ;

LOCATE COMP "led_o" SITE "U16";
IOBUF PORT "led_o" IO_TYPE=LVCMOS25;

We are taking the pad P3 of the FPGA (site definition), saying it has a 3V3 CMOS level (LVCMOS33) and that it is a frequency type port that’s receiving a 25MHz clock.

And we are saying the U16 is a 2.5V CMOS level (LVCMOS25) GPIO. This shouldn’t make sense since all the FPGA I/O banks (VCCIO pins) are supposedly connected to a 3.3V rail and therefore should be LVCMOS33 too. However, here we copied from other files but there are little reasons why this one was set to that.

Digging into “SOMETHING”

We will ignore the details of clkdiv for that lesson and do something people programming FPGA love: making abstractions and ignoring the implementation details! In practice unfortunately you often have to go dig in implementations for performance, size of power consumption reasons.

module top(input clk_i, output led_o);
   reg  led_reg;
   wire baseclk;

   clkdiv #(.DIV(200000)) slowclk (clk_i, baseclk);

   always @(posedge baseclk) begin;
      led_reg <= ~led_reg;
   end

   assign led_o = led_reg;
endmodule

Lets decypher that:

module top(input clk_i, output led_o);
endmodule

This is declaring a module named top (mandatory in Yosys, that’s the start of everything) that has one input clk_i and one output led_o. All things in Verilog are divided like that so blocks can be reused and reimplemented easily. You could think of them as “functions” for now (even if the reality is slightly more complex).

    reg led_reg;
    wire baseclk;

Here we are talking about two essential entities of Verilog, reg or registers which are ways to store information. And wires which are how blocks are connected to each other. The “input” and “output” above can also be specified as reg or wire but if not explicitly defined they are wire by default. i.e. input clk_i => input wire clk_i

    clkdiv #(.DIV(200000)) slowclk (clk_i, baseclk);

Here we are instantiating a clkdiv block called slowclk. It takes clk_i and baseclk are what we call “nets” they are the connections to and from the module. And we set the parameter DIV to the value 200000. What this module does is dividing the input clock on clk_i by the DIV value and outputing that to baseclk. In our case clk_i is 25MHz, we divide by 200000 so we get a 125Hz output clock.

   always @(posedge baseclk) begin;
      led_reg <= ~led_reg;
   end

Now we are getting into the reality of FPGAs. Things are working in a synchronous way, but not as you see it in a computer where everything is directed by a single clock. In an FPGA you can have dozens or more clocks in your different modules, a single module can use multiple clocks etc. This the most difficult thing to grasp and solving clock issues is probably one of the major difficulty when working with FPGAs, because of latencies, potential inaccuracies of clocks (we will talk about that when using LiteX) and other issues. Here we say, on the positive edge of baseclk, we want to assign to led_reg the inverse of its value (so if it was 0 it becomes 1 etc).

We have various kinds of assignment in Verilog and we have more details in the Candy jar - Assignments.

We will not get into the details here, just try to use non-blocking assignments inside your sequential (has either posedge or negedge) @always blocks to avoid race-conditions.

assign led_o = led_reg;

Here we say, at any time, led_o should take the value of led_reg. We are effectively connecting our register to the output of the FPGA.

Here is a representation of what we know so far:

graph LR; subgraph FPGA clk_i; led_o; end subgraph top clk_i; subgraph CLKDIV["slowclk is_a clkdiv (DIV=200000)"] magic; end baseclk-->PE{On posedge?}; PE -->|Yes| IV[Invert led_reg]; led_reg-->led_o; end clk_i-->CLKDIV; CLKDIV-->baseclk;

Running the thing

We need quite a few steps to go from our Verilog description to something we can send to the FPGA.

graph TD; Verilog --> S[[Synthesis\nYosys]]; S --> OD[Optimized design]; OD --> M[[Mapping\nYosys]]; M --> MD[Mapped design]; MD --> PR[[Place and Route\nnextpnr]]; PR --> PLD[Physical layout description]; PLD --> PK[[Generate flashable bitstream\necppack]]; PK --> Bitstream;

These are the commands you need to run to do those steps:

# blink.json is the optimized and mapped design
yosys -p "synth_ecp5 -top top -json blink.json" blink.v
# this will generate a placed and routed file blink_out.config
nextpnr-ecp5 --json blink.json --textcfg blink_out.config --25k --package CABGA381 --lpf blink.lpf
# this will generate both a SVF file and a bitstream BIT file
ecppack --compress --svf blink.svf blink_out.config blink.bit

A SVF (Serial Vector Format) file is a text file that describe the all the instructions that will be sent on the JTAG interface to program the chip. It is the way to transfer the Bitstream into the chip through its JTAG interface. Currently we are just copying in a volatile way to the chip, cut the power and it is gone. There are ways to change the SVF file so it sends it to the connected FLASH instead (if it has one).

Sending to the board

If ecpdap is installed and working on your machine, you can use this to program the board:

ecpdap program blink.bit

Otherwise, openFPGALoader should do the trick:

sudo $HOME/oss-cad-suite/libexec/openFPGALoader -b "colorlight-i5" --freq "16000000" blink.svf

Exercice

Make the led blink at 1Hz instead of 125Hz.
Remove led_reg and the assign statement altogether by using output reg led_o and using led_o directly in the @always block.

02 - Repeat after me

Managing Inputs

We now want to talk to that FPGA. For that we have to declare an input:

LOCATE COMP "port_i" SITE "E1";
IOBUF PORT "port_i" PULLMODE=UP IO_TYPE=LVCMOS33;

Be really careful, the inputs on an FPGA are way more sensitive than on microcontrollers. Thankfully, really often you just burn that IO or a group of IOs so the FPGA still works but with less IOs.

Do not connect coils directly (relays, motors, speakers) and make sure you are using 3.3V logic (or lower if you defined lower in your LPF file) if you are connecting to something else.

The FPGA contains pull-ups and pull-downs that you can set at synthesis time:

For more details on how the ECP5 handles the IOs see: https://www.latticesemi.com/-/media/LatticeSemi/Documents/ApplicationNotes/EH/FPGA-TN-02032-1-3-ECP5-ECP5G-sysIO-Usage-Guide.ashx?document_id=50464

Memory

In Verilog, you can create data types of arbitrary types:

	reg [SIZE-1:0] buffer;

This will create a register of SIZE bits called buffer.

To access a single bit, you use the bit index. Verilog uses zero-based indexing, so the least significant bit (LSB) is at index 0 and the most significant bit (MSB) is at index 7. For example, buffer[0] accesses the LSB of buffer, and buffer[SIZE-1] accesses the MSB. To access a range of bits, you use a colon to specify the bit range. For example, buffer[3:0] accesses the least significant 4 bits of buffer, and buffer[7:4] accesses the most significant 4 bits (assuming SIZE=8 here).

We can set the value of a register at the beginning of the life of the module:

  initial begin
    buffer = 8'b01010101;
  end

Code

follow.lpf:

LOCATE COMP "clk_i" SITE "P3";
IOBUF PORT "clk_i" IO_TYPE=LVCMOS33;
FREQUENCY PORT "clk_i" 25 MHZ;

LOCATE COMP "led_o" SITE "U16";
IOBUF PORT "led_o" IO_TYPE=LVCMOS25;

LOCATE COMP "port_i" SITE "E1";
IOBUF PORT "port_i" PULLMODE=UP IO_TYPE=LVCMOS33;

follow.v

module top(input clk_i, input port_i, output led_o);
   reg  led_reg;
   wire baseclk;

   clkdiv #(.DIV(200000)) slowclk (clk_i, baseclk);
   ring_buffer  buffer (baseclk, port_i, led_o);
endmodule


module ring_buffer (
  input wire clk,
  input wire data_in,
  output wire data_out
);
  reg [255:0] buffer;
  reg [7:0]   write_pointer;
  reg [7:0]   read_pointer;

  initial begin
    buffer = 256'b0000000000000000000000000000000000000000000011111111000000001111111100000000111111110000000011111111111111110000000011111111111111110000000011111111111111110000000011111111000000001111111100000000111111110000000000000000000000000000000000000000000000000000;
  end

  always @(posedge clk) begin
    buffer[write_pointer] <= data_in;
    write_pointer <= write_pointer + 1;
    read_pointer  <= read_pointer + 1;
  end

  assign data_out = buffer[read_pointer];

endmodule


module clkdiv #(parameter DIV = 24'd5000)(
    input wire clk_i,
    output wire clk_o
    );

    reg [24:0] count = 25'b0;
    reg clk_o_internal = 1;
    //on this board we have a 25MHz clock

    always @(posedge clk_i) begin
        count <= count + 25'b1;
        if(count == DIV) begin
            count <= 25'b0;
            clk_o_internal <= ~clk_o_internal;
        end
    end
    assign clk_o = clk_o_internal;
endmodule

Build

yosys -p "synth_ecp5 -top top -json follow.json" follow.v
nextpnr-ecp5 --json follow.json --textcfg follow.config --25k --package CABGA381 --lpf follow.lpf
ecppack --compress --svf follow.svf follow.config follow.bit

Upload

ecpdap program follow.bit

sudo $HOME/oss-cad-suite/libexec/openFPGALoader -b "colorlight-i5" --freq "16000000" follow.svf

Exercice

Make it have a longer memory, bonus points if it is a parameter to the module

03 - LiteX

Using high-level languages to describe systems offers several significant advantages. They abstract away many low-level details, making the design process more intuitive and efficient. This abstraction allows designers to focus on the system’s functionality and architecture, rather than getting bogged down in the minutiae of hardware specifics. High-level languages also tend to be more expressive and readable than low-level languages, facilitating better understanding and communication among team members. Furthermore, they often come with extensive standard libraries and tools, enabling rapid prototyping and debugging. Lastly, high-level languages allow for the use of advanced software engineering practices, such as object-oriented programming and automated testing, leading to more reliable and maintainable systems (well that is the theory at least).

Migen and LiteX

Migen is a Python-based tool that aims to make digital design (including both ASIC and FPGA design) more efficient and enjoyable (yay). It was developed by the M-Labs team as an improvement over existing hardware description languages (HDLs) such as VHDL and Verilog.

Migen introduces several novel concepts and adopts many good ones from existing HDLs. It also leverages the capabilities of Python as a powerful and expressive high-level language, enabling you to write more compact, maintainable, and reusable digital designs.

Key features of Migen include:

Fragmented Hardware Description: Migen allows hardware designs to be split into fragments that can be combined and transformed.
Powerful Language Constructs: Migen provides powerful language constructs like generators and list comprehensions, which can simplify and improve the readability of complex hardware designs.
Python-based: As a Python-based tool, Migen allows you to use Python’s extensive standard library and third-party modules, making it easier to develop and test hardware designs.
Built-in Simulation: Migen includes a built-in simulation environment that allows you to test your designs without needing external tools.
FPGA Flow Management: Migen includes the MiSoC project (now called LiteX), which provides a high-level way to design system-on-chip solutions, manage the FPGA flow, and handle on-chip interconnects.

Learning Migen may require learning Python if you are not already familiar with it. However, for those comfortable with Python, Migen can provide a powerful and flexible toolset for digital design.

LiteX is an open-source Python library that provides a high-level, hardware-agnostic interface for developing digital designs for FPGAs. It was created by the Enjoy Digital team and is widely used in the FPGA community. LiteX simplifies the process of designing and deploying digital designs on FPGAs by abstracting away many of the low-level details that are often required when working directly with FPGA hardware.

LiteX includes various components that help in the development of FPGA projects, such as:

Core: A Python library that provides a high-level API for describing digital circuits.
SoC: An SoC (System-on-Chip) builder that allows you to create and integrate custom SoCs with various peripherals and soft processor cores.
BIOS: A minimal BIOS for LiteX SoCs, which helps in the initial configuration and testing of the hardware.
Build tools: A set of tools that assist in the generation of FPGA bitstreams, including wrappers for various FPGA synthesis and place-and-route tools (such as Yosys and NextPNR).

LiteX can be used with a wide range of FPGA devices and development boards, including the Lattice ECP5 that is on your board. The library is also compatible with several soft processor cores, like RISC-V and LM32, allowing users to create custom SoCs tailored to their specific needs.

Example: An ethernet logic-analyzer in a few lines of Python

Create a file main.py

#!/usr/bin/env python3

#
# This file is based on Colorlite (https://github.com/enjoy-digital/colorlite)
#
# Copyright (c) 2020-2022 Florent Kermarrec <florent@enjoy-digital.fr>
# Copyright (c) 2023 Jonathan Bisson <bjonnh-github@bjonnh.net>
# SPDX-License-Identifier: BSD-2-Clause

from liteeth.phy.ecp5rgmii import LiteEthPHYRGMII
from litescope import LiteScopeAnalyzer
from litex.build.generic_platform import *
from litex.soc.cores.clock import *
from litex.soc.cores.gpio import GPIOOut
from litex.soc.cores.led import LedChaser
from litex.soc.cores.spi_flash import ECP5SPIFlash
from litex.soc.integration.builder import *
from litex.soc.integration.soc_core import *
from litex_boards.platforms import colorlight_i5
from migen import *
from migen.genlib.misc import WaitTimer

# IOs ----------------------------------------------------------------------------------------------

#_gpios = [
#    # GPIOs.
#    ("gpio", 0, Pins("j4:0"), IOStandard("LVCMOS33")),
#    ("gpio", 1, Pins("j4:1"), IOStandard("LVCMOS33")),
#]


# CRG ----------------------------------------------------------------------------------------------

class _CRG(Module):
    def __init__(self, platform, sys_clk_freq):
        self.clock_domains.cd_sys = ClockDomain()
        # # #

        # Clk / Rst.
        clk25 = platform.request("clk25")
        #rst_n = platform.request("user_btn_n", 0)

        # PLL.
        self.submodules.pll = pll = ECP5PLL()
        #self.comb += pll.reset.eq(~rst_n)
        pll.register_clkin(clk25, 25e6)
        pll.create_clkout(self.cd_sys, sys_clk_freq)


# ColorLite ----------------------------------------------------------------------------------------

class ColorLite(SoCMini):
    def __init__(self, sys_clk_freq=int(50e6), with_etherbone=True, ip_address=None, mac_address=None):
        platform = colorlight_i5.Platform(revision="7.0")

        # CRG --------------------------------------------------------------------------------------
        self.submodules.crg = _CRG(platform, sys_clk_freq)

        # SoCMini ----------------------------------------------------------------------------------
        SoCMini.__init__(self, platform, clk_freq=sys_clk_freq)

        # Etherbone --------------------------------------------------------------------------------
        if with_etherbone:
            self.submodules.ethphy = LiteEthPHYRGMII(
                clock_pads=self.platform.request("eth_clocks"),
                pads=self.platform.request("eth"),
                tx_delay=0e-9)
            self.add_etherbone(
                phy=self.ethphy,
                ip_address=ip_address,
                mac_address=mac_address,
                data_width=32,
            )

        # artificial signal
        count = Signal(8)
        #rst_n = platform.request("user_btn_n", 0)
        self.sync += count.eq(count + 1)
        analyzer_signals = [
            count,
        #    rst_n
        ]
        self.submodules.analyzer = LiteScopeAnalyzer(analyzer_signals,
            depth=1024,
            clock_domain="sys",
            samplerate=self.sys_clk_freq,
            csr_csv="analyzer.csv"
        )
        self.add_csr("analyzer")

        # GPIOs ------------------------------------------------------------------------------------
        #platform.add_extension(_gpios)

        # Power switch
        #power_sw_pads = platform.request("gpio", 0)
        #power_sw_gpio = Signal()
        #power_sw_timer = WaitTimer(2 * sys_clk_freq)  # Set Power switch high after power up for 2s.
        #self.comb += power_sw_timer.wait.eq(1)
        #self.submodules += power_sw_timer
        ##self.submodules.gpio0 = GPIOOut(power_sw_gpio)
        #self.comb += power_sw_pads.eq(power_sw_gpio | ~power_sw_timer.done)

        # Reset Switch
        #reset_sw_pads = platform.request("gpio", 1)
        #self.submodules.gpio1 = GPIOOut(reset_sw_pads)


# Build --------------------------------------------------------------------------------------------

def main():
    parser = argparse.ArgumentParser(description="Take control of your ColorLight FPGA board with LiteX/LiteEth :)")
    parser.add_argument("--build", action="store_true", help="Build bitstream")
    parser.add_argument("--load", action="store_true", help="Load bitstream")
    parser.add_argument("--ip-address", default="10.0.0.42",
                        help="Ethernet IP address of the board (default: 10.0.0.42).")
    parser.add_argument("--mac-address", default="0x726b895bc2e2",
                        help="Ethernet MAC address of the board (defaullt: 0x726b895bc2e2).")
    args = parser.parse_args()

    soc = ColorLite(ip_address=args.ip_address, mac_address=int(args.mac_address, 0))
    builder = Builder(soc, output_dir="build", csr_csv="csr.csv")
    builder.build(build_name="colorlite", run=args.build)

    if args.load:
        prog = soc.platform.create_programmer()
        prog.load_bitstream(os.path.join(builder.gateware_dir, soc.build_name + ".bit"))

if __name__ == "__main__":
    main()

Build

python3 main.py --build

Upload

We are supposed to be able to do:

ecpdap program build/gateware/colorlite.bit

To load on the board but it doesn’t always work. In which case you should do:

sudo $HOME/oss-cad-suite/libexec/openFPGALoader -b "colorlight-i5" --freq "16000000" ./build/gateware/colorlite.svf

Run

Check that you can ping your board (edit the main.py file and rebuild if you don’t like the default IP)

ping 10.0.0.42

Run the litex server

litex_server --udp --udp-ip=10.0.0.42

Start an acquisition:

litescope_cli --dump dump.sr

It will trigger the logic analyzer and save a file dump.sr with the signals. You can also do a .csv by changing dump.sr by dump.csv.

You can view it using tools like sigrok-cli or gtkwave:

sigrok-cli -i dump.sr -O ascii

You can see a demo video below:

How much does it take on the chip

In pink, ethernet, in blue the analyzer and its memory.

Exercice

Insert the repeat after me code from previous lesson so you can record it.

This one is hard and will require you to read this documentation: https://github.com/enjoy-digital/litex/wiki/Reuse-a-(System)Verilog,-VHDL,-(n)Migen,-Spinal-HDL,-Chisel-core

04 - Blink...again?

Tri-state buffers

Logic states in verilog can not only take on the binary values of one and zero, but there is a valid “third” state called high-Z (or high impedance). This is a fancy way of saying disconnect something completely, i.e. an open circuit.

To highlight this, let’s take a 1-bit wire that is presumably connected to an output pin with nothing else connected to it. We can set it to any of these 1-bit constants.

wire tri_test = 1'b0;	// the voltage at this pin would be 0 volts
wire tri_test = 1'b1;	// the voltage here would be 3.3 volts
wire tri_test = 1'bz;	// voltage here...is not well defined; the pin is "floating"

This is called a tri-state buffer. These are typically used on bidirectional data buses with one or more devices attached. When a single device on the bus wants to put data out onto the bus, all the other connected devices must put their data lines into “high-Z”, or put another way, “tri-state” their outputs. If they don’t do this properly, then the bus is in “contention” and can result in funky operation and/or damage. The opposite can also happen where the bus is completely floating and can give erronous values unless something “pulls” it up/down to a high/low state.

Although using z in verilog is perfectly valid, the Yosys open-source synthesis tools currently don’t support translating something like 1'bz onto a GPIO pin. You’ll get a warning that looks like this: Warning: Yosys has only limited support for tri-state logic at the moment.

However, since verilog is only our higher level “C code” for describing FPGA behavior, we can drop down to use the “inline assembly” of describing this same behavior. The language FPGAs use for its low-level building blocks are called the “device primitives”. Lattice has a document describing the primitives for many of its FPGA devices and it can be found here.

We’re interested in the BB primitive which they call “CMOS Input 6mA Sink 3mA Source Sinklim Output Buffer with Tristate – BiDirectional”. They give the following schematic representation and truth table:

If we want to instantiate this primitve in our code, the sysI/O usage guide gives a verilog example as such:

BB buf7 (.I(Q_out7), .T(Q_tri7), .O(buf_Data7), .B(Data[7]));

This wasn’t covered previously, but when instantiating a module (any module) you can list the instance ports in the same order as the module definition. Or you can do what they did here where the port name within the module is called out with a dot in front of it and the instance port is in parentheses next to it. This creates a mapping where things can be out of order, or you can leave unused ports out if you don’t need them. It also reduces the chance of making a mistake in the ordering.

Finite state machines

Building a state machine is a fundamental exercise in FPGA design. All CPUs use some kind of state machine as the basis of how they operate and execute instructions. The clock signal moves the machine forward through its states We first need to define our states and describe how we want to allow them to move between each other. We’ll make a project where we hook up a single GPIO pin (FPGA-pad K5 as coded in this example) as such:

and we’ll make that pin a tri-state output and define all three states. When the pin is a one (3.3V), the LED should turn off completely because all the current is flowing between the pin and ground, i.e. we’re shorting across the LED. When the pin is a zero (0V), the LED is on and quite bright because current flows through the parallel combination of both resistors (~180ohm) since they’re both connected to 0V (ground). And finally, when the pin is high-Z it is disconnected so the LED is still on but current is only flowing through the 1k resistor and will therefore be much dimmer. So we have three states we can move between.

localparam state_off = 2'b00;
localparam state_dim = 2'b01;
localparam state_brite = 2'b10;
reg [1:0] led_state = state_off;

The localparam keyword is typicaly a good choice for defining constant values within a module. There are other ways but we won’t cover them now. Since there are 3 valid states, we need at least two bits to cover that many states. Therefore, our defined states and state register are all 2-bits.

We’ll use a clock for this state machine which yields us a 100ms period, so we divide the 25MHz by 2500000:

clkdiv #(.DIV(2500000)) slowclk(
	.clk_i(clk), 
	.clk_o(baseclk));

Lastly, we want to be able to control how long we’re in each state. To keep it simple, we’ll use the same time between each state.

localparam state_time = 5;
reg [7:0] counter = state_time;

The best way to manage our states is to use the verilog case statement. Our state register is the “expression” for the case, and our predefined state names are what’s populated for each case. It’s important to populate the default case so the machine never gets “stuck”. So we’ll just copy the code for state_off into default.

The state machine is structured such that the initial state is off, then it proceeds to the dim state, and then the bright state before it loops back to the off state and repeats indefinitely. The enable register is what we use to put the output into the high-Z state. When enable is high, we don’t care what led is set to. The following diagram shows a visual representation of this very basic state machine.

The Code

At this point, let’s just see the code in its entirety:

blink_zed.v

module top(input wire clk, output wire led_pin);
	wire baseclk;
	reg led;
	reg enable;
	
	// set up state names and initialize the state
	localparam state_off = 2'b00;
	localparam state_dim = 2'b01;
	localparam state_brite = 2'b10;
	reg [1:0] led_state = state_off;
	
	// divide input 25MHz clock to get a period of 100ms (10Hz)
	clkdiv #(.DIV(2500000)) slowclk(
			.clk_i(clk), 
			.clk_o(baseclk));
	
	// set up delay counter for the state machine
	// each clock period is 100ms so total wait time is 500ms
	localparam state_time = 5;
	reg [7:0] counter = state_time;
	
	// the actual state machine
	always @(posedge baseclk) begin
	case (led_state)
		state_off: begin
			enable = 1'b1;
			led <= 1'b1;
			counter <= counter - 1;
			if (counter == 0) begin
				counter <= state_time;
				led_state <= state_dim;
			end
		end
		state_dim: begin
			enable = 1'b0;
			led <= 1'b1;
			counter <= counter - 1;
			if (counter == 0) begin
				counter <= state_time;
				led_state <= state_brite;
			end
		end
		state_brite: begin
			enable = 1'b1;
			led <= 1'b0;
			counter <= counter - 1;
			if (counter == 0) begin
				counter <= state_time;
				led_state <= state_off;
			end
		end
		default: begin
			enable = 1'b1;
			led <= 1'b1;
			counter <= counter - 1;
			if (counter == 0) begin
				counter <= state_time;
				led_state <= state_dim;
			end
		end
	endcase
	end
	
	// primitive for bi-directional buffer which allows tri-stating
	BB tristate_out (.I(led), .T(~enable), .B(led_pin));
	
endmodule


module clkdiv #(parameter DIV = 24'd5000)(
    input wire clk_i,
    output wire clk_o
    );

    reg [24:0] count = 25'b0;
    reg clk_o_internal = 1;
    //on this board we have a 25MHz clock

    always @(posedge clk_i) begin
        count <= count + 25'b1;
        if(count == DIV) begin
            count <= 25'b0;
            clk_o_internal <= ~clk_o_internal;
        end
    end
    assign clk_o = clk_o_internal;
endmodule

The only portion of the above code not fully discussed yet is the BB tristate_out instance. A couple things to note. First, the T port is like an active-low enable input, so our enable signal is being inverted with the leading tilde (~) character. Also worth mentioning is that the O port is not used here so we don’t have to connect anything to it.

blink_zed.lpf

LOCATE COMP "clk" SITE "P3";
IOBUF PORT "clk" IO_TYPE=LVCMOS33;
FREQUENCY PORT "clk" 25 MHZ;

LOCATE COMP "led_pin" SITE "K5";
IOBUF PORT "led_pin" IO_TYPE=LVCMOS33 DRIVE=12;

A bonus tidbit here is the DRIVE attribute which sets the drive strength of an output pin. There is much more info in the the sysI/O usage guide, but the general jist of it is that you can set it between 4, 8, 12, and 16. What this does in practice is change the ON resistance of the transistors driving the output. A higher value of drive strength results in lower ON resistance, which means the effective series resistance of your output is lower and therefore you can push/pull more current out/in of the pin while maintaining valid logic levels at the receiving device(s).

Build

yosys -p "synth_ecp5 -top top -json blink_zed.json" blink_zed.v
nextpnr-ecp5 --json blink_zed.json --textcfg blink_zed_out.config --25k --package CABGA381 --lpf blink_zed.lpf
ecppack --compress --svf blink_zed.svf blink_zed_out.config blink_zed.bit

Upload

ecpdap program blink_zed.bit

sudo $HOME/oss-cad-suite/libexec/openFPGALoader -b "colorlight-i5" --freq "16000000" blink_zed.svf

Exercice

Make the led change state faster
Make the led change state slowweeerrrr
Make the led stay on longer when bright and shorter when dim
Change the order of the states such that it’s dim->off->bright
Change the state based on a push-button input instead of clock time

05 - LiteX for real

LiteX can do a lot more than our previous example. It was created to generate SoC (systems on chips) that can be configured.

For this lesson to work you will have to install a Risc V compiler.

Create a file colorlight_i5.py:

#!/usr/bin/env python3

#
# This file is part of LiteX-Boards.
#
# Copyright (c) 2021 Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
# SPDX-License-Identifier: BSD-2-Clause

from migen import *

from litex.gen import *

from litex.build.io import DDROutput

from litex_boards.platforms import colorlight_i5

from litex.soc.cores.clock import *
from litex.soc.integration.soc_core import *
from litex.soc.integration.builder import *
from litex.soc.cores.video import VideoHDMIPHY
from litex.soc.cores.led import LedChaser

from litex.soc.interconnect.csr import *

from litedram.modules import M12L64322A # Compatible with EM638325-6H.
from litedram.phy import GENSDRPHY, HalfRateGENSDRPHY

from liteeth.phy.ecp5rgmii import LiteEthPHYRGMII

# CRG ----------------------------------------------------------------------------------------------

class _CRG(LiteXModule):
    def __init__(self, platform, sys_clk_freq, use_internal_osc=False, with_usb_pll=False, with_video_pll=False, sdram_rate="1:1"):
        self.rst    = Signal()
        self.cd_sys = ClockDomain()
        if sdram_rate == "1:2":
            self.cd_sys2x    = ClockDomain()
            self.cd_sys2x_ps = ClockDomain()
        else:
            self.cd_sys_ps = ClockDomain()

        # # #

        # Clk / Rst
        if not use_internal_osc:
            clk = platform.request("clk25")
            clk_freq = 25e6
        else:
            clk = Signal()
            div = 5
            self.specials += Instance("OSCG",
                p_DIV = div,
                o_OSC = clk
            )
            clk_freq = 310e6/div

        rst_n = platform.request("cpu_reset_n")

        # PLL
        self.pll = pll = ECP5PLL()
        self.comb += pll.reset.eq(~rst_n | self.rst)
        pll.register_clkin(clk, clk_freq)
        pll.create_clkout(self.cd_sys,    sys_clk_freq)
        if sdram_rate == "1:2":
            pll.create_clkout(self.cd_sys2x,    2*sys_clk_freq)
            pll.create_clkout(self.cd_sys2x_ps, 2*sys_clk_freq, phase=180) # Idealy 90° but needs to be increased.
        else:
           pll.create_clkout(self.cd_sys_ps, sys_clk_freq, phase=180) # Idealy 90° but needs to be increased.

        # USB PLL
        if with_usb_pll:
            self.usb_pll = usb_pll = ECP5PLL()
            self.comb += usb_pll.reset.eq(~rst_n | self.rst)
            usb_pll.register_clkin(clk, clk_freq)
            self.cd_usb_12 = ClockDomain()
            self.cd_usb_48 = ClockDomain()
            usb_pll.create_clkout(self.cd_usb_12, 12e6, margin=0)
            usb_pll.create_clkout(self.cd_usb_48, 48e6, margin=0)

        # Video PLL
        if with_video_pll:
            self.video_pll = video_pll = ECP5PLL()
            self.comb += video_pll.reset.eq(~rst_n | self.rst)
            video_pll.register_clkin(clk, clk_freq)
            self.cd_hdmi   = ClockDomain()
            self.cd_hdmi5x = ClockDomain()
            video_pll.create_clkout(self.cd_hdmi,    40e6, margin=0)
            video_pll.create_clkout(self.cd_hdmi5x, 200e6, margin=0)

        # SDRAM clock
        sdram_clk = ClockSignal("sys2x_ps" if sdram_rate == "1:2" else "sys_ps")
        self.specials += DDROutput(1, 0, platform.request("sdram_clock"), sdram_clk)

# BaseSoC ------------------------------------------------------------------------------------------

class BaseSoC(SoCCore):
    def __init__(self, board="i5", revision="7.0", toolchain="trellis", sys_clk_freq=60e6,
        with_ethernet          = False,
        with_etherbone         = False,
        local_ip               = "",
        remote_ip              = "",
        eth_phy                = 0,
        with_led_chaser        = True,
        use_internal_osc       = False,
        sdram_rate             = "1:1",
        with_video_terminal    = False,
        with_video_framebuffer = False,
        **kwargs):
        board = board.lower()
        assert board in ["i5", "i9"]
        platform = colorlight_i5.Platform(board=board, revision=revision, toolchain=toolchain)

        # CRG --------------------------------------------------------------------------------------
        with_usb_pll   = kwargs.get("uart_name", None) == "usb_acm"
        with_video_pll = with_video_terminal or with_video_framebuffer
        self.crg = _CRG(platform, sys_clk_freq,
            use_internal_osc = use_internal_osc,
            with_usb_pll     = with_usb_pll,
            with_video_pll   = with_video_pll,
            sdram_rate       = sdram_rate
        )

        # SoCCore ----------------------------------------------------------------------------------
        SoCCore.__init__(self, platform, int(sys_clk_freq), ident = "LiteX SoC on Colorlight " + board.upper(), **kwargs)

        # Leds -------------------------------------------------------------------------------------
        if with_led_chaser:
            ledn = platform.request_all("user_led_n")
            self.leds = LedChaser(pads=ledn, sys_clk_freq=sys_clk_freq)

        # SPI Flash --------------------------------------------------------------------------------
        if board == "i5":
            from litespi.modules import GD25Q16 as SpiFlashModule
        if board == "i9":
            from litespi.modules import W25Q64 as SpiFlashModule

        from litespi.opcodes import SpiNorFlashOpCodes as Codes
        self.add_spi_flash(mode="1x", module=SpiFlashModule(Codes.READ_1_1_1))

        # SDR SDRAM --------------------------------------------------------------------------------
        if not self.integrated_main_ram_size:
            sdrphy_cls = HalfRateGENSDRPHY if sdram_rate == "1:2" else GENSDRPHY
            self.sdrphy = sdrphy_cls(platform.request("sdram"))
            self.add_sdram("sdram",
                phy           = self.sdrphy,
                module        = M12L64322A(sys_clk_freq, sdram_rate),
                l2_cache_size = kwargs.get("l2_size", 8192)
            )

        # Ethernet / Etherbone ---------------------------------------------------------------------
        if with_ethernet or with_etherbone:
            self.ethphy = LiteEthPHYRGMII(
                clock_pads = self.platform.request("eth_clocks", eth_phy),
                pads       = self.platform.request("eth", eth_phy),
                tx_delay = 0)
            if with_ethernet:
                self.add_ethernet(phy=self.ethphy)
            if with_etherbone:
                self.add_etherbone(phy=self.ethphy)

        if local_ip:
            local_ip = local_ip.split(".")
            self.add_constant("LOCALIP1", int(local_ip[0]))
            self.add_constant("LOCALIP2", int(local_ip[1]))
            self.add_constant("LOCALIP3", int(local_ip[2]))
            self.add_constant("LOCALIP4", int(local_ip[3]))

        if remote_ip:
            remote_ip = remote_ip.split(".")
            self.add_constant("REMOTEIP1", int(remote_ip[0]))
            self.add_constant("REMOTEIP2", int(remote_ip[1]))
            self.add_constant("REMOTEIP3", int(remote_ip[2]))
            self.add_constant("REMOTEIP4", int(remote_ip[3]))

        # Video ------------------------------------------------------------------------------------
        if with_video_terminal or with_video_framebuffer:
            self.videophy = VideoHDMIPHY(platform.request("gpdi"), clock_domain="hdmi")
            if with_video_terminal:
                self.add_video_terminal(phy=self.videophy, timings="800x600@60Hz", clock_domain="hdmi")
            if with_video_framebuffer:
                self.add_video_framebuffer(phy=self.videophy, timings="800x600@60Hz", clock_domain="hdmi")

# Build --------------------------------------------------------------------------------------------

def main():
    from litex.build.parser import LiteXArgumentParser
    parser = LiteXArgumentParser(platform=colorlight_i5.Platform, description="LiteX SoC on Colorlight I5.")
    parser.add_target_argument("--board",            default="i5",             help="Board type (i5).")
    parser.add_target_argument("--revision",         default="7.0",            help="Board revision (7.0).")
    parser.add_target_argument("--sys-clk-freq",     default=60e6, type=float, help="System clock frequency.")
    ethopts = parser.target_group.add_mutually_exclusive_group()
    ethopts.add_argument("--with-ethernet",   action="store_true",      help="Enable Ethernet support.")
    ethopts.add_argument("--with-etherbone",  action="store_true",      help="Enable Etherbone support.")
    parser.add_target_argument("--remote-ip", default="192.168.1.100",  help="Remote IP address of TFTP server.")
    parser.add_target_argument("--local-ip",  default="192.168.1.50",   help="Local IP address.")
    sdopts = parser.target_group.add_mutually_exclusive_group()
    sdopts.add_argument("--with-spi-sdcard",  action="store_true", help="Enable SPI-mode SDCard support.")
    sdopts.add_argument("--with-sdcard",      action="store_true", help="Enable SDCard support.")
    parser.add_target_argument("--eth-phy",          default=0, type=int, help="Ethernet PHY (0 or 1).")
    parser.add_target_argument("--use-internal-osc", action="store_true", help="Use internal oscillator.")
    parser.add_target_argument("--sdram-rate",       default="1:1",       help="SDRAM Rate (1:1 Full Rate or 1:2 Half Rate).")
    viopts = parser.target_group.add_mutually_exclusive_group()
    viopts.add_argument("--with-video-terminal",    action="store_true", help="Enable Video Terminal (HDMI).")
    viopts.add_argument("--with-video-framebuffer", action="store_true", help="Enable Video Framebuffer (HDMI).")
    args = parser.parse_args()

    soc = BaseSoC(board=args.board, revision=args.revision,
        toolchain              = args.toolchain,
        sys_clk_freq           = args.sys_clk_freq,
        with_ethernet          = args.with_ethernet,
        with_etherbone         = args.with_etherbone,
        local_ip               = args.local_ip,
        remote_ip              = args.remote_ip,
        eth_phy                = args.eth_phy,
        use_internal_osc       = args.use_internal_osc,
        sdram_rate             = args.sdram_rate,
        with_video_terminal    = args.with_video_terminal,
        with_video_framebuffer = args.with_video_framebuffer,
        **parser.soc_argdict
    )
    soc.platform.add_extension(colorlight_i5._sdcard_pmod_io)
    if args.with_spi_sdcard:
        soc.add_spi_sdcard()
    if args.with_sdcard:
        soc.add_sdcard()

    builder = Builder(soc, **parser.builder_argdict)
    if args.build:
        builder.build(**parser.toolchain_argdict)

    if args.load:
        prog = soc.platform.create_programmer()
        prog.load_bitstream(builder.get_bitstream_filename(mode="sram"))

if __name__ == "__main__":
    main()

Build

python3 colorlight_i5.py --ecppack-compress --build

Upload

ecpdap program build/gateware/colorlite.bit

(We could have used –load but there is a bug in Litex)

If your ecpdap works

Or:

openFPGALoader -b "colorlight-i5" --freq "16000000" ./build/colorlight_i5/gateware/colorlight_i5.svf

Connect using the serial terminal

On Linux:

litex_term /dev/ttyACM0

On windows I do not know at all.

Once you are inside press enter you should see:

litex>

Now:

litex> reboot
        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2023 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on May 19 2023 21:03:55
 BIOS CRC passed (afa6ed09)

 LiteX git sha1: 0f1ad8dc

--=============== SoC ==================--
CPU:            VexRiscv @ 60MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128.0KiB
SRAM:           8.0KiB
L2:             8.0KiB
FLASH:          2.0MiB
SDRAM:          8.0MiB 32-bit @ 60MT/s (CL-2 CWL-2)
MAIN-RAM:       8.0MiB

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB     
   Read: 0x40000000-0x40200000 2.0MiB     
Memtest OK
Memspeed at 0x40000000 (Sequential, 2.0MiB)...
  Write speed: 22.1MiB/s
   Read speed: 30.2MiB/s

Initializing GD25Q16 SPI Flash @0x00200000...
Enabling Quad mode...
First SPI Flash block erased, unable to perform freq test.
Memspeed at 0x200000 (Sequential, 4.0KiB)...
   Read speed: 1.6MiB/s
Memspeed at 0x200000 (Random, 4.0KiB)...
   Read speed: 936.6KiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
No boot medium found

--============= Console ================--

Congrats, you have a Risc V SoC !

Resources

About the FPGA chip

The Lattice ECP5 is a reasonably powerful FPGA which has had its bitstream fully reversed by Project Trellis. This enables it be to fully supported by the open source tools, requiring no vendor tools.

An in-depth overview of the boards we are using is on Tom Verbeure’s blog.

The TLDR is: The Colorlite i5 is technically part of a commercial videowall product, but is used by us as a dev board because it meets an excellent balance of cost, availability, powerful features and practicality. The community has reversed this board, and additionally, created an “extension board” that is a carrier which breaks out the SO-DIMM with easier to use connectors and also provides a USB-JTAG programming interface. Here’s some detailed specs and links:

The Colorlite i5 V7.0 itself provides the FPGA, specifically the LFE5U-25F-6BG381C in a 381-ball 0.8mm pitch BGA package., two gigabit PHYs, and a generous amount of GPIO on a SO-DIMM form factor. Sold here: https://www.aliexpress.com/item/1005001686186007.html
A third-party “extension board” (it’s a breakout board) means you don’t have to solder an SO-DIMM socket yourself and have lots of PMOD-like connectors and a ready-to-go USB-JTAG Programmer: https://github.com/wuxx/Colorlight-FPGA-Projects
Schematic for the i5 Ext board: https://github.com/wuxx/Colorlight-FPGA-Projects/blob/master/schematic/i5_v6.0-extboard.pdf?raw=true
Pin mapping of the extension board: https://tomverbeure.github.io/2021/01/30/Colorlight-i5-Extension-Board-Pin-Mapping.html
Finaly, another add-on to the extension board provides ethernet: https://github.com/kazkojima/colorlight-i5-tips
The SO-DIMM socket https://www.newark.com/amp-te-connectivity/1473149-4/connector-sodimm-socket-200pos/dp/21K5412

About the OSS FPGA eco-system

The tools for working with FPGAs in an open-source stack started around 2015. Project icestorm enabled the use of the Lattice Icestick (a popular ~$20 pre-pandemic development board for the ICE40 FPGA). More recently this board has become hard to come by, however, the toolchain has been greatly developed and supports quite a number of other FPGAs as well.

OSS Cad Suite - Bundles together many of the components necessary for the stack.: https://github.com/YosysHQ/oss-cad-suite-build
Yosys - OSS FPGA Synthesis: https://yosyshq.net/yosys/
Litex - building hardware easily on FPGA with Python code: https://github.com/enjoy-digital/litex
VexRiscv - RISC-V CPU: https://github.com/SpinalHDL/VexRiscv
NextPNR - Place and route tool : https://github.com/YosysHQ/nextpnr
Project Trellis - PNR / Bitstream for ECP5: https://github.com/YosysHQ/prjtrellis

Terminology

FPGA - Field Programmable Gate Array. A programmable device, which consists of arbitrarily configurable logic.
IP Core - Blocks of logic used to perform specific functions
JTAG - A serial programming and testing interface used to program our FPGA boards.
LUT - Lookup table: A piece of logic that allows mappings of a series of inputs states to specific output. An FPGA largely consists of a programmable/configurable matrix of LUTs, and the “size” of an FPGA is often specified in LUTs.
OSS - Open Source Software
PMOD - A connector laid out using standard 0.1" headers. Usually provides a byte worth of IO and power/ground in a standard layout
Python - A general purpose high-level programming lanugage
Verilog - A programming language used specifically with FPGAs

Candy jar

The candy jar is where we have pages that explain in more details some concepts for the curious.

Flashing and programming the ECP5

Programming

Faster upload

Programing the ECP5 can be horribly slow with the integrated circuit.

I recommend something like: https://github.com/phdussud/pico-dirtyJtag with a RP2040 that can program it at high speed. You would have to remove the pogo pins or insulate them and connect the circuit to it.

Flash, or how to get persistent programs

The colorlight i5 board also contains an SPI EEPROM. This allows you to set a bitstream that will be loaded everytime the board starts instead of having to load it yourself. But it is unfortunately locked when you get it.

You can unprotect the flash with:

ecpdap flash unprotect

And write to the flash

ecpdap flash write yourfile.bit

If ecpdap doesn’t work

There is a program, included in oss-cad that allows to unlock it. Unfortunately, at least on Linux, I had to compile my own with cargo the build system for rust. You just have to run

cargo install ecpdap
~/.cargo/bin/ecpdap scan

This should reply with something like

Detected JTAG chain, closest to TDO first:
 - 0: 0x41111043 (Lattice Semi.) [IR length: 8] [LFE5U-25]

Verilog to bitstream - detailed process

This was in a huge part generated from GPT4. Don’t hesitate to contact us about mistakes and errors.

Looks like GPT stole a lot from the Lattice ECP5 manual and obviously made no attributions.

I took some stuff from the great presentation by Claire Xenia Wolf on Yosys: https://github.com/YosysHQ/yosys-manual-build/releases/download/manual/presentation.pdf

Overview

Digital circuits can be described at different levels of abstraction:

System level : Overall description of the system
High level : Mostly for humans, that’s how we code in Python, Java, Ruby…
Behavioral level : A cycle-accurate description of the hardware (Verilog, VHDL…)
Register-Transfer level : Lists of operations that allow the system to go from one state to another
Logical gate level : Single bit description of the system (can be a bunch of NANDs or more complex)
Physical gate level : Mapping of the system on a physical device that can use its specific computing units (LUTs, multipliers, divisers…)
Switch level : Transistor description

LiteX will convert High level to Behavioral level Yosys handles everything from Behavioral level to Physical gate level. This then all gets placed by nextpnr on the chip itself trying to satisfy the user constraints of speed, performance or area (in the FPGA world we don’t talk about space of a “program” we talk about area on the die).

From system description to configuration

The process of converting Verilog or VHDL code to a bitstream for an FPGA involves several steps. These steps are broadly similar for different FPGAs and toolchains, but here are the details for the Lattice ECP5 FPGA and open-source tools like Yosys and Project Trellis:

High-level synthesis (HLS): This step involves converting the high-level Verilog or VHDL code into a lower-level representation that can be more easily optimized and mapped to the FPGA’s resources. Yosys is an open-source synthesis tool that can take Verilog or VHDL code (using the GHDL plugin) as input and perform HLS.
Optimization: Yosys will perform various optimizations on the code, such as constant propagation, dead code elimination, and technology mapping. This helps to reduce the complexity of the design and make it more suitable for the target FPGA.
Mapping to FPGA resources: The optimized design is then mapped to the specific resources available on the target FPGA, such as LUTs (Look-Up Tables), flip-flops, and other specialized components like DSP blocks and memory blocks. In the case of the ECP5 FPGA, this step is performed by Yosys using the “synth_ecp5” command, which maps the design to the ECP5’s resources.
Place and Route (P&R): Once the design is mapped to the FPGA’s resources, the next step is to determine the physical placement of these resources on the FPGA and the interconnect routing between them. This is a critical step, as the placement and routing can significantly impact the performance and resource utilization of the design. Nextpnr is an open-source P&R tool that can be used with the ECP5 FPGA, and it works in conjunction with Project Trellis, which provides a database of the ECP5’s architecture and bitstream format.
Bitstream generation: After the P&R process, the final step is to generate the bitstream that will configure the FPGA to implement the design. This bitstream is a binary file that contains the configuration data for the FPGA’s resources and interconnects. Nextpnr and Project Trellis work together to generate the bitstream for the ECP5 FPGA. The “nextpnr-ecp5” command is used to run the Nextpnr tool, which takes the output from Yosys and generates a bitstream using the Project Trellis database.
Programming the FPGA (configuration): Once the bitstream is generated, it can be programmed onto the ECP5 FPGA using an appropriate programming tool. For example, the open-source tool OpenOCD can be used to program the FPGA via a JTAG interface.

High-level synthesis

High-Level Synthesis (HLS) is the process of converting a high-level hardware description language (HDL) design, such as Verilog or VHDL, into a lower-level representation that is suitable for further optimization and mapping to an FPGA or ASIC. The primary goal of HLS is to generate an optimized, technology-independent representation of the design that can be more easily mapped to the target hardware.

HLS involves several sub-processes, including:

Parsing: The first step in HLS is parsing the input HDL code to create an internal representation of the design. This internal representation is typically an abstract syntax tree (AST) or a similar data structure that captures the structure of the HDL code, including its modules, instances, and hierarchy.
Elaboration: The next step is elaboration, which involves resolving the design hierarchy, parameterization, and instantiation of modules. The elaboration process creates a flattened representation of the design by expanding instantiated modules and resolving their interconnections. This step is important for analyzing and optimizing the design at the module level and below.
Behavioral synthesis: Behavioral synthesis focuses on converting the behavioral descriptions of the design (i.e., the algorithmic or functional specifications) into a structural representation that can be more easily optimized and mapped to hardware resources. This process typically involves converting high-level constructs such as loops, conditional statements, and arithmetic operations into a dataflow graph or a control-data flow graph (CDFG). These graphs represent the design at a lower level of abstraction and expose opportunities for optimizations such as pipelining, loop unrolling, and resource sharing.
Scheduling: Scheduling is the process of assigning operations in the design to specific time steps or clock cycles. This is a critical step in HLS because it determines the latency, throughput, and resource utilization of the design. Scheduling can be performed using various algorithms, such as ASAP (As Soon As Possible), ALAP (As Late As Possible), or more sophisticated techniques that take into account resource constraints and performance goals.
Resource allocation and binding: Once the operations have been scheduled, the next step is to allocate and bind the required hardware resources, such as functional units (adders, multipliers, etc.), registers, and memories. The resource allocation and binding process involves assigning each operation in the design to a specific functional unit and determining the mapping of variables to registers or memory. This step has a significant impact on the area and power consumption of the final implementation.
RTL generation: After the scheduling, resource allocation, and binding processes, the HLS tool generates a Register-Transfer Level (RTL) representation of the design. The RTL representation is a lower-level description of the design that can be more easily mapped to an FPGA or ASIC. It typically consists of a netlist of interconnected registers, functional units, and multiplexers, along with the associated control signals.

The output of the HLS process is an optimized RTL design that can be further processed by downstream tools, such as logic synthesis, technology mapping, and place-and-route. The primary advantage of using HLS is that it allows designers to work at a higher level of abstraction, which can improve productivity and enable more complex designs. However, HLS tools may not always generate the most optimized implementations, and manual RTL coding or optimization may still be required for certain designs or performance-critical components.

Optimization

The optimization stage is a crucial part of the process of converting high-level hardware descriptions into a lower-level representation that can be mapped onto FPGAs or ASICs. During optimization, the design is transformed and refined to meet performance, area, and power goals, while ensuring that the functional requirements are maintained. This stage typically involves several types of optimizations, including:

Constant propagation: This optimization identifies and simplifies expressions containing constants, replacing them with their constant values. This can help reduce the complexity of the design and eliminate unnecessary logic gates.
Dead code elimination: Dead code refers to portions of the design that have no impact on the output or are never executed. Dead code elimination identifies and removes such redundant parts of the design, which helps to save area and power by eliminating unnecessary resources.
Boolean simplification: This optimization applies Boolean algebra rules to simplify the logic expressions in the design, reducing the number of gates and interconnections. This can help to minimize the area and delay of the resulting implementation.
Algebraic simplification: This technique simplifies arithmetic expressions in the design, by applying algebraic identities and arithmetic properties. For example, multiplication by a power of two can be replaced with a shift operation, which is typically more efficient in hardware.
Common subexpression elimination: This optimization identifies and eliminates redundant computations in the design by reusing the results of identical subexpressions. This can help to reduce the overall complexity of the design and save hardware resources.
Technology mapping: Technology mapping transforms the design into a representation that is compatible with the target FPGA or ASIC technology. This involves mapping the design to the specific resources available on the target device, such as Look-Up Tables (LUTs), flip-flops, and other specialized components like DSP blocks and memory blocks. Technology mapping also optimizes the design for the target technology by selecting the most suitable primitives and resource configurations.
Retiming: Retiming is an optimization technique that moves registers across combinational logic boundaries to improve performance or reduce area. This can help to balance the pipeline stages or critical paths, which can improve the overall performance of the design.
Resource sharing: This optimization technique identifies opportunities to share hardware resources among multiple operations or instances, which can help to save area and power. This is particularly important when mapping the design to an FPGA, where resources such as DSP blocks and memories are limited.

These optimization techniques can be applied in various orders and combinations, depending on the specific design and optimization goals. Some optimizations may be more suitable for certain types of designs or target technologies, and trade-offs may need to be made between performance, area, and power.

In the context of open-source tools like Yosys, many of these optimizations are performed automatically as part of the synthesis process. However, the user can also control or fine-tune the optimizations through various command-line options or synthesis scripts. It is essential to understand and apply the appropriate optimizations to achieve the desired design goals while maintaining the functional requirements of the design.

Mapping

Mapping to FPGA resources is the process of converting the optimized design representation (typically at the Register-Transfer Level, or RTL) into a form that is compatible with the specific resources available on the target FPGA. This process involves determining how the design’s logic functions, storage elements, and other components can be efficiently implemented using the FPGA’s resources. Some of the key resources available on FPGAs include:

Look-Up Tables (LUTs): LUTs are the primary building blocks of FPGAs, used to implement combinational logic functions. They are essentially small, programmable memory elements that can generate any desired logic function of their inputs. LUTs usually have a fixed number of inputs (e.g., 4-input, 6-input) and can be cascaded or combined to implement more complex functions.
Flip-flops (FFs): Flip-flops are used to store state information in the design, providing synchronous storage elements for implementing registers, counters, and state machines. FPGAs typically have dedicated flip-flops associated with each LUT, which can be configured to store the output of the LUT or to bypass the LUT and store an external signal.
Block RAM (BRAM): BRAM is a dedicated memory resource available on FPGAs, used to implement larger memory structures such as buffers, caches, and lookup tables. BRAMs can be configured to support different widths and depths, and multiple BRAMs can be combined to create larger memory structures.
Digital Signal Processing (DSP) blocks: DSP blocks are specialized resources available on many FPGAs for implementing arithmetic functions, such as multipliers, adders, and accumulators. They are designed to provide higher performance and lower power consumption compared to equivalent implementations using LUTs and flip-flops.
Input/Output (I/O) blocks: I/O blocks are the interface between the FPGA and external signals. They can be configured to support various I/O standards, voltages, and drive strengths. I/O blocks also often include support for specialized functions, such as clock input and output, analog-to-digital conversion, and high-speed serial interfaces.

During the mapping process, the design’s components are assigned to specific FPGA resources, and the interconnections between these resources are established. This involves:

Logic synthesis: The RTL design is transformed into a gate-level netlist, which is a representation of the design in terms of gates, flip-flops, and other low-level primitives. This step may involve additional optimizations, such as constant propagation, Boolean simplification, and resource sharing.
Technology mapping: The gate-level netlist is mapped to the specific resources available on the target FPGA, such as LUTs, flip-flops, and DSP blocks. This step involves selecting the most suitable FPGA primitives and configurations to implement the design’s components while optimizing for performance, area, and power.
Packing: Packing is the process of grouping the mapped components into larger structures called “tiles” or “clusters” that correspond to the physical resources on the FPGA. For example, LUTs and flip-flops can be packed together into “slices,” which are the basic building blocks of many FPGAs. Packing can help to improve the utilization of FPGA resources and reduce the complexity of the place-and-route process.

In the context of the Lattice ECP5 FPGA and open-source tools like Yosys, the mapping process is performed using the “synth_ecp5” command, which performs logic synthesis and technology mapping specifically for the ECP5’s resources. Once the design is mapped to the ECP5’s resources, it can be passed to the next stage of the implementation process, which is the place-and-route (P&R) step using tools

Place and Route (PNR)

Place and Route (P&R) is a critical stage in the FPGA or ASIC implementation process, following the mapping of the design to the target device’s resources. The primary goal of P&R is to determine the physical placement of the design’s components on the FPGA or ASIC and the interconnect routing between them. The quality of the P&R process has a significant impact on the performance, area, and power consumption of the final implementation. P&R involves two main sub-processes:

Placement: The placement step determines the physical locations of the design’s components (such as LUTs, flip-flops, and DSP blocks) on the FPGA or ASIC. The objective is to find an optimal placement that minimizes the total wirelength, reduces congestion, and meets performance (timing) constraints. The placement process can be guided by various algorithms, such as simulated annealing, genetic algorithms, or analytical techniques. During placement, the tool also takes into account the target device’s architecture, including the arrangement of resources, the available routing resources, and any other constraints, such as fixed locations for specific components (e.g., I/O pins or clock resources).
Routing: After the components have been placed, the routing step determines the interconnect paths between them, using the available routing resources (such as routing channels, switches, and wire segments) on the FPGA or ASIC. The goal of the routing process is to find a congestion-free and timing-optimized routing solution that satisfies the design’s performance requirements and minimizes power consumption. Routing can be performed using various algorithms, such as pathfinder, negotiated congestion, or iterative rip-up and reroute techniques.

P&R tools often use a cost function or objective function to guide the optimization process and evaluate the quality of the placement and routing solutions. This function may take into account various factors, such as wirelength, congestion, timing, area, and power consumption. The P&R tool iteratively refines the placement and routing solutions to minimize the cost function, subject to the design constraints and target device’s architectural constraints.

In the context of the Lattice ECP5 FPGA and open-source tools, Nextpnr is a commonly used P&R tool, which works in conjunction with Project Trellis. Project Trellis provides a database of the ECP5’s architecture and bitstream format, enabling Nextpnr to perform P&R specifically for the ECP5 FPGA. To run Nextpnr for ECP5, the “nextpnr-ecp5” command is used, which takes the mapped design output from Yosys as input and generates a placed and routed design suitable for bitstream generation.

During the P&R process, the tool generates various reports and visualizations that help designers analyze the quality of the placement and routing solutions and identify any issues or bottlenecks. These reports may include information about the utilization of FPGA resources, the distribution of wirelengths, the critical path delays, and the timing slack. Designers can use this information to guide further optimization or refinement of the design, either at the RTL level, during the mapping process, or by adjusting P&R constraints and settings.

Bitstream generation

Bitstream generation is the final stage in the FPGA implementation process, following the successful completion of the Place and Route (P&R) step. The bitstream is a binary file that contains the configuration data required to program the FPGA with the designed hardware. This data includes the settings for the FPGA’s programmable resources, such as Look-Up Tables (LUTs), flip-flops, memory blocks, and interconnect routing. The bitstream generation process involves converting the placed and routed design into a format that can be loaded onto the FPGA to configure its resources.

During bitstream generation, the following steps are typically performed:

Design translation: The placed and routed design is translated into a device-specific representation that includes the settings for the FPGA’s programmable resources. This step involves mapping the design’s components and interconnects to the specific configuration elements of the target FPGA, such as configuration memory cells, programmable switches, and routing multiplexers.
Bitstream assembly: The translated design is assembled into a bitstream, which is a sequence of binary data that represents the configuration settings for the FPGA’s resources. The bitstream is organized according to the target FPGA’s programming interface and memory organization, and may include additional data, such as error detection codes, synchronization patterns, or configuration commands.
Bitstream compression (optional): Some FPGA vendors and tools support bitstream compression, which can help to reduce the size of the bitstream and the time required to program the FPGA. Bitstream compression typically involves applying lossless compression algorithms, such as Run-Length Encoding (RLE) or Huffman coding, to the configuration data.
Bitstream encryption and authentication (optional): In order to protect intellectual property or ensure the security of the design, some FPGA vendors and tools support bitstream encryption and authentication. This involves encrypting the bitstream using a cryptographic algorithm (such as AES) and a secret key, and/or adding an authentication code (such as an HMAC) to the bitstream. The FPGA’s programming interface must support the corresponding decryption and authentication mechanisms to load the encrypted or authenticated bitstream.

In the context of the Lattice ECP5 FPGA and open-source tools, the bitstream generation process is performed using the Project Trellis tools. After the P&R step with Nextpnr, the output is a JSON file that represents the placed and routed design. This file is then passed to the “ecppack” tool, which is part of the Project Trellis suite, to generate the final bitstream. The “ecppack” tool translates the design’s components and interconnects into the ECP5’s configuration elements and assembles the bitstream according to the ECP5’s programming interface and memory organization.

Once the bitstream is generated, it can be loaded onto the target FPGA using a programming tool or interface, such as JTAG, SPI, or an embedded configuration memory. The FPGA reads the bitstream and configures its programmable resources according to the design’s settings, thereby implementing the desired hardware functionality.

Verilog assignments

We have various kinds of assignment in Verilog:

In clocked blocks (always @(posedge clk ) for example)
- Blocking assignment (led_reg = clk_i)
- Non blocking assignments (led_reg <= clk_i)
Continuous assignments (assign led_o = clk_i)

Things have to be scheduled by whatever interprets the HDL description of your system.

This is just a simplified view but this is more or less the order of operations in a given module:

Run all the blocking assignments
Run the right-hand side(RHS) of non-blocking assignments and schedule the updates of the left hand side
Update the left-hand-side
Run the continuous assignments
Update inputs and outputs
…

There is an additional mode which is not present on every FPGA but is on the ECP5 that allows you to initialize registers to specific values. Some FPGA don’t have that and will either assign at 0, or even have values random.

You have to think that non-blocking statements may take more cycles to happen than you think (because things have to propagate). Whereas blocking ones will happen instantly.

Avoid mixing both if you don’t need to (and the result may not be exactly portable from different kinds of FPGA).

ECP5 in details

Main view

This is a symbolic representation of the ECP5 chip on your i5 board (from nextpnr-ecp5 –gui)

PFU blocks

Each of those numerous squares is a what is called a PFU block:

Slices

And each of these blocks contain 4 slices: And each of those slices contain 2 LUT4s and 2 registers. They can take multiple configurations like a block of RAM (SRAM), or ROM or be combined to make LUT5,6,7 or 8.

LUTs

Let’s see what a LUT is.

LUT is short for Look-up table. A LUT is a small memory with an address decoder. A LUTn will have 2^n memory cells. So a LUT4 will have 16 memory cells. With 4 wires as its inputs, it can then store 16 different values.

To make a XOR gate you would program a LUT2 as such:

A0	A1	Output
0	0	0
0	1	1
1	0	1
1	1	0

The slices usually contain complex routing systems that allow connecting the LUT, registers and other components of the slice together in various ways.

These are the kind of functions that a single slice can do in an ECP5:

Addition 2-bit
Subtraction 2-bit
Add/Subtract 2-bit using dynamic control
Up or Down counter 2-bit
Up/Down counter with asynchronous clear
Up/Down counter with preload (sync)
Ripple mode multiplier building block
Multiplier support
Comparator functions of A and B inputs
A greater-than-or-equal-to B
A not-equal-to B
A less-than-or-equal-to B

Implementing RAM in an FPGA is pretty costly. To implement 16x4 bits of SRAM, you need 3 slices. For DPRAM (RAM that can be written or read at the same time by two entities), you need 6 slices. That’s why FPGAs often include specific blocks of RAM (BRAM) to reduce the area.

The ECP5 on the i5 (LFE5U-25) is one of the small ones. It has 24K LUTs, 56 blocks of 18 Kbits of system memory (fully enclosed memory system that can do dual port and have their own clocks). 1008 Kbits of embedded memory. 194 Kbits of distributed ram.

It also contains 28 hardware multipliers (18bits by 18 bits).

There are also more advances DSP cores (sysDSP) but they are not handled yet by the OSS tools. If you are interested in helping with the reverse engineering of that, I’m sure the world of ECP5 users would love you very much.

Clocks

sysCLOCK PLLs: The ECP5 FPGA contains four sysCLOCK Phase-Locked Loop (PLL) modules, which can generate a wide range of output frequencies and phase relationships from a single input clock source. They provide flexible clock generation and conditioning features such as frequency synthesis, clock phase shifting, and clock duty cycle control. There are two of those in the ECP5 you have.
DDR-PLLs: In addition to the sysCLOCK PLLs, the ECP5 FPGA also contains two DDR-PLLs specifically designed for high-speed DDR memory interfaces, ensuring accurate clocking for memory operations. There are two of those in the ECP5 you have.
Global Clock Buffers (GCBs): ECP5 FPGA devices feature a global clock network comprising GCBs. These buffers are responsible for distributing low-skew, low-jitter clock signals throughout the FPGA fabric. They help reduce clock skew and minimize clock-to-clock and clock-to-data signal delays.
Clock Multiplexers (MUXes): Clock MUXes are used to select between multiple clock sources for a specific logic element or resource in the FPGA. This allows designers to optimize power consumption and performance by selecting the appropriate clock source for different parts of the design.
Clock Dividers: The ECP5 FPGA also features clock dividers that allow designers to generate multiple, slower clock signals from a single high-speed clock source. This helps in reducing power consumption and optimizing performance for specific logic elements or resources.
Enhanced Logic Array Blocks (E-LABs): E-LABs are the basic building blocks of the ECP5 FPGA, containing programmable logic elements, registers, and memory resources. Each E-LAB can be independently clocked, providing fine-grained control over clock distribution and enabling efficient power management.
Dynamic Clock Gating: The ECP5 FPGA supports dynamic clock gating, which allows designers to selectively enable or disable clock signals to specific regions of the FPGA fabric, reducing power consumption when parts of the design are idle.
Clock and Data Recovery (CDR) circuits: These are used in high-speed serial interfaces such as SERDES to recover clock signals embedded in data streams, ensuring proper synchronization between the transmitter and receiver. But in the ECP5 chip of the colorlight i5 there is no SERDES.

ECP5 routing with blinky

This is the routing of our blink example from lesson 01. In light blue we see the clock dividing mechanism. In orange in the top-right that’s our assign led register to led output. And in green are all the support things (connecting clocks and things together plus a lot of things I have no idea about)

If we zoom in the top right, we can see how the divider network is connected to the led register:

PS1’s 2023 FPGA 101

Subsections of

Introduction

Aims of the class

Organization

Disclaimer

Requirements and setup

Computer setup

Boards

Subsections of Requirements and setup

Linux setup (recommended)

Computer setup

Scripted installation on Debian/Ubuntu

Fast installation on other distros

Linux details (useful only if you have issues or another distro)

Install LiteX

Testing it

Usage

MacOS setup

MacOS

Install LiteX

Testing it

Usage

Windows setup

Computer setup

Install on Windows

JTAG programmer drivers

RISC v compiler suite

VM using VirtualBox

Lessons

Subsections of Lessons

01 - Blink

How code is organized

Verilog

LPF file

Digging into “SOMETHING”

Running the thing

Sending to the board

Exercice

02 - Repeat after me

Managing Inputs

Memory

Code

Build

Upload

Exercice

03 - LiteX

Migen and LiteX

Example: An ethernet logic-analyzer in a few lines of Python

Build

Upload

Run

How much does it take on the chip

Exercice

04 - Blink...again?

Tri-state buffers

Finite state machines

The Code

Build

Upload

Exercice

05 - LiteX for real

Build

Upload

Connect using the serial terminal

Resources

About the FPGA chip

About the Colorlite i5 board and related boards

About the OSS FPGA eco-system

Terminology

Candy jar

Subsections of Candy jar

Flashing and programming the ECP5

Programming

Faster upload

Flash, or how to get persistent programs

If ecpdap doesn’t work

Verilog to bitstream - detailed process

Overview

From system description to configuration