DECA, a new Altera MAX10 FPGA evaluation board from Arrow/Terasic, is the most versatile low cost FPGA board I have.
I’m planning to write a series covering some projects for DECA, and the graphics is the first topic.
I’m using Quartus 15.0 for Linux Update 2 for the projects.
Here’s some details about the DECA.
Currently it sells for $169. Arrow holds a series of global workshops covering DECA since May 2015.
The small form factor FPGA board packs with features:
- High end MAX10 device with 50,000 LEs, and pin count 22×22
- 4 PLLs
- Dual ADCs
- 512MB DDR3
- 64MB Flash
- 10/100 Ethernet
- USB2.0 OTG
- HDMI v1.4 Transmitter
- MIPI CSI-2
- 24bit Audio
- 2 SMA inputs
- Various sensors
- Gesture, Proximity/Ambient light
- Humidity and temperature
- Power monitor
- BeagleBone I/O expansion headers
The board I have is an evaluation kit comes with a wireless WiFi/BLE BeagleBone Cape and a 8M camera module. In my opinion, BeagleBone I/O expansion makes many kinds of interesting DIY projects possible.
The graphics for MAX10 is implemented in VIP (Altera Video and Image Processing Suite IPs).
The following block diagram shows the FPGA setup and software processes for graphics rendering.
There are some challenges, the tough hardware and software ones are listed below:
- Choosing proper parameters for VIP components to generate proper video timing signals
- Low graphics rendering frame rate. In 1024×768 resolution, HMI application’s rendering performance is in the range of couple frames per second.
To address the video timing issue (i.e., pixels drawn to the screen are not properly displayed), different parameters are tried to configure IP cores. This design uses DDR3 controller with 64 bit Avalon interface data width. The Frame Reader IP (VFR), which converts user graphics/image/video data stored in external memory to a video stream understood by VIP, should have the same master port data width as the DDR3 controller. According to the VIP user guide, VFR’s control register “Frame Words” should be “The number of words (reads from the master port) to read from memory for the frame”.
To calculate the “Frame Words”, VFR pixel counts is used as the dividend and 2 (64/32) as the divisor. This configuration worked for one resolution, but not in other resolutions. Different VFR’s master port widths were tried (32 and 128), but had the same result. After days of struggles, I finally settled down the proper parameter combination working with all 640×480, 800×480 and 1024×768 resolutions.
|DDR3 Controller||Avalon Interface Data Width||64|
|Frame Reader||Master Port Width||256|
|Color Plane Sequencer||Halve Control Packet Width||disabled|
Other key things I learned from hardware perspective:
- With the current configuration, it can only have up to two VFRs in the design, or Quartus Analysis & Synthesis, namely quarts_map, will crash. I believe the reason is the hardware resource limitation from the parts used in DECA
- To get correct DDR3 pin assignments and interconnects, run deca_vip_mem_if_ddr3_emif_p0_pin_assignments.tcl at least once, after Analysis & Synthesis but before Fitter (Place & Route) compilation. The reminding notice is buried in synthesis compiling messages.
The graphics software demonstration is done in uCOS II environment, it has 3 tasks. The main task performs HMI (Human-Machine Interface) displaying 640×480 automotive glass cluster.
The other rendering task draws two vertical and horizontal bars in 800×480 screen. The third task moves the main HMI screen.
When the HMI screen is in 1024×768 resolution, the rendering frame rate is in the range of 1 to 2 fps. If the resolution is reduced to 640×480, fps improves but is still slow. Further investigation indicates the performance issue lies within HMI code itself, not necessary due to VIP or graphics implementation.
HMI code used is based on Altia’s proprietary framework.
In theory, the graphics hardware has enough bandwidth to drive a 640×480 32bit/pixel screen approaching 30 fps.
The following NIOS II Eclipse screenshot shows assembly and C code snippet of a pixel filling routine.
The Nios II Gen 2 softcore is clocked by a 100MHz PLL. Assuming in a best scenario, there is no CPU pipeline stall, no RTOS context switch, and one clock cycle per instruction.
The inner loop (j loop) takes:
2 + ( 2 + ( 5 + 6 ) * inner_loop_count ) cycles.
The outer loop (i loop) takes:
( 2 + ( inner_loop_cycles + 7 + 6 ) * outer_loop_count ) cycles
Therefore to update 640×480 pixels screen, cycle total needed is 3,388,320, or it takes about 33ms or in 30 fps. This is a simplified case, the actual frame rate depends on rendering content and will be of course much lower.
VFR supports two frame buffers, and the feature is very useful for double buffering. While displaying the front buffer, you can build scene in the back buffer and swap the buffers when the back buffer building is done. Although it won’t improve rendering performance, the technique effectively eliminates video flickering. The demo code for the drawing task uses the approach.
The hardware design, software demo code and HMI library, plus prebuilt FPGA bitstream and application binary are available at:
Note: I don’t have commercial VIP license, the prebuilt bitstream is time-limited.
You may need turn off and then on the HDMI monitor for the monitor to pick up the video signal (at least for the Sceptre monitor I’m using).
This video clip shows the complete process to configure FPGA, download and run graphics demo from command line, and debug the code from Eclipse for Nios.