Configure Nordic Softdevice to Support Custom BLE Profile with Multiple Services and Characteristics

August 8, 2016

My recent BLE project involves developing a custom peripheral profile for a 9 DOF sensor fusion embedded system with multiple sensors, such as accelerometer, gyroscope, magnetometer, and altimeter, etc. The system employs a nRF52 BLE module to transfer raw sensor data, plus calculated sensor fusion data, to a BLE central device, namely an iOS/Android mobile device, or a desktop computer with BLE capability, such as OSX or Windows 10 PC.

Softdevice is Nordic’s BLE stack implementation. The nRF52 softdevice is named s132. The project uses the recent nRF5 SDK version 11.0.0 released in March 2016 for development.
This version of SDK comes with version 2.0.0 softdevice s132.

By default, s132 supports one custom service and reserves memory for some characteristics. When you dig into the softdevice’s header files, you can find that reserved memory size is 0x580. My experiments tell me that amount of memory can support over one dozen of characteristics. That’s good in many situations, but not enough for my project, which requires 3 custom services and 42 characteristics, including many with notification.

Fortunately the softdevice is architectured in such way that it can be configured in runtime to meet application developer’s need.
There are couple of example projects in nRF5 SDK use multiple custom services, but there is none about expanding the reserved memory to support many more characteristics.
Hopefully my experience mentioned here is of useful for people needs more characteristics, and save some of their time to avoid troubles I had.

The screenshot below shows all services and characteristics of the custom BLE profile designed in BDS. BDS (Bluetooth Developer Studio) is Bluetooth SIG‘s design, development and testing tool. Many vendors, including Nordic and TI, provide BDS plugins. A vendor’s plugin can be used to generate certain code for its platform, saving tremendous amount of time for developers. For example, with this shown design, Nordic version 1.2.4 plugin generates 6,393 lines of code in multiple source files. Imagine how long it may take coding all of this manually. I will cover BDS in more details later in a separate blog.



Here’re the steps I used to configure the softdevice to meet the project requirement.

  1. Configure the count of custom services
    According to Nordic programming convention, softdevice configuration is done in the following function:

        static void ble_stack_init(void)                                    

    To specify the number of supported custom services is easy. The following code snippet sets up 3 services:

        ble_enable_params_t ble_enable_params;
        ble_enable_params.common_enable_params.vs_uuid_count = 3;           


  2. Configure the attribute table size
    Set up attribute table size is trickier, and I don’t think Nordic provided any such example in nRF5 SDK.
    In my instance, I added 0x700 more bytes of memory to support 42 characteristics. The reason is the default memory size of 0x580 supports 17 characteristics, and it supports 9 more characteristics after the size is adjusted to 0x800.

        /* 0x580 added by 0x700 */
        ble_enable_params.gatts_enable_params.attr_tab_size = 0x0c80;        

    Here’s the complete function:

    /**@brief Function for the SoftDevice initialization.
     * @details It initializes the SoftDevice and the BLE event interrupt.
    static void ble_stack_init(void)
        uint32_t err_code;
        nrf_clock_lf_cfg_t clock_lf_cfg = NRF_CLOCK_LFCLKSRC;
        // Initialize SoftDevice.
        SOFTDEVICE_HANDLER_INIT(&clock_lf_cfg, NULL);
        ble_enable_params_t ble_enable_params;
        err_code = softdevice_enable_get_default_config(CENTRAL_LINK_COUNT,
         * Supports 3 custom services
        ble_enable_params.common_enable_params.vs_uuid_count = 3;
        /* 0x580 added by 0x700 */
        ble_enable_params.gatts_enable_params.attr_tab_size = 0x0c80;
        //Check the ram settings against the used number of links
        // Enable BLE stack.
        err_code = softdevice_enable(&ble_enable_params);
        // Subscribe for BLE events.
        err_code = softdevice_ble_evt_handler_set(ble_evt_dispatch);


  3. Configure the user application memory location and size
    The last key step is to update the linker script to adjust the offset and size of the memory available to the BLE peripheral application.
    I’m using GNU/Eclipse toolchain, the idea is the same when MDK or IAR is used.
    Here’s the default MEMORY section’s RAM attribute values of the linker script:

      RAM (rwx) :  ORIGIN = 0x20002080, LENGTH = 0xdf80               

    Here’s the modified values. Basically it moves forward the memory offset by the memory amount added (0x700), and reduce the available memory size by the same amount to make sure there is no overlap between softdevice and application memory regions.

      RAM (rwx) :  ORIGIN = 0x20002780, LENGTH = 0xd880                

    Note the up end of the RAM region is 0x20010000 (0x20002080+0xdf80, or 0x20002780+0xd880). Consider RAM origin starts at 0x20000000 for the softdevice, 0x10000 (64KB) is exactly the size of nRF52 RAM.

    Here’s the complete linker script:

    /* Linker script to configure memory regions. */
    GROUP(-lgcc -lc -lnosys)
      FLASH (rx) : ORIGIN = 0x1c000, LENGTH = 0x64000
      /* 0xc80 = 0x580 + 0x700 */
      RAM (rwx) :  ORIGIN = 0x20002780, LENGTH = 0xd880                
      .fs_data :
        PROVIDE(__start_fs_data = .);
        PROVIDE(__stop_fs_data = .);
      } > RAM
    } INSERT AFTER .data;
    INCLUDE "nrf5x_common.ld"

Nordic just released s132 softdevice v3.0.0 in August 2016, I will migrate soon and find out more details about the release.

Using Motion Detection To Wake Up Embedded System – A Practical Example

February 2, 2016

Put the embedded system into sleep while not in use is a must have for any battery powered applications.The tricky part is how to wake up the system if it has no peripherals, such as switch, to interact with. Using motion detection to wake up the system is a natural choice.
The example demonstrated here uses a digital accelerometer via I2C interface for motion detection. The accelerometer is configured to generate interrupts after motion is detected. The motion interrupt then wakes up the system from the deep sleep mode.
The following hardware, software and system configurations are used in the project. The custom KL03Z board is designed by Embedded Masters, other parts not used in the demo are not mentioned here. EFM32 Giant Gecko board and Simplicity Studio is optional and only used for Power/Energy Profiler.

Hardware Description
Freescale Kinetis Custom Board KL03Z MCU (Cortex-M0+ 48MHz, 32KB flash, 2 KB SRAM)
MMA8652 3-Axis 12-bit Digital Accelerometer
Silicon Labs EFM32 Giant Gecko (STK3700) Run Power/Energy Profiler
Segger JLink Debugger Program and/or Debug Kinetics MCU based hardware
Software Description
Freescale KDS 3.0 Freescale Kinetis MCU Eclipse IDE
Freescale KSDK 1.3 Kinetis SDK
Silicon Labs Simplicity Studio v3 Run Power/Energy Profiler
System Configuration Description
LPO Internal low power oscillator(1000Hz) works in very low power/leakage modes
LPTMR Low power timer with prescaler 3 to accommodate timer period up to 17.6 minutes
Clocking by LPO
I2C Baud rate 100Hz
MMA8652 Configured in Free-fall/Motion detection mode
Enable motion detection interrupt 2
Route motion detection interrupt 2 pin to PTB0 per KL03Z datasheet
PMC (Power Management Controller) VLPR (Very Low Power Run) mode while the system is up
VLPS (Very Low Power Stop) mode while in deep sleep
LLWU (Low Leakage Wakeup Unit) Route LLWU wake-up source pin P4 to MMA interrupt 2 pin
RGB LEDs Active-low

The key is to configure MMA8652 motion detection interrupt 2 and LLWU P4 to share the same pin such that the motion detection becomes the wake-up source. The following picture shows the KL03Z data sheet about how pins are assigned.

Pin Assignment

This is the application code blinks red and green LEDs 5 times before entering the sleep mode, and blinks blue LED once after waking up. Wakeup source is a 5 second timer and/or motion detection.

Application Code
 *  @brief Enter deep sleep mode, and use timer/motion detection to wakeup
void enter_sleep()
    /*  Step 0 - Init MMA */
    mma8652_init( &i2c_devices, 0 );
    /* Step 1 - enable LPO timer to make it another wake up source (5s) */
    timer_set_period( 5*1000, true );

    /* Step 2 - Enable motion detection interrupt */
    enable_motion_interrupt( true );

    /* Step 3 - Switch to deep sleep mode VLPS */
    /* Using motion detection/timer as the wakeup sources */
    pmc_sleep( true, true );

    /* Step 4 - Exiting sleep mode */
    /* timer is reenabled by pmc_sleep call */
    /* disable motion detection interrupt */
    enable_motion_interrupt( false );

* @brief Main routine
void main (void)
    /* initialize peripherals */

        for(int i=0; i<5; i++ )


The following screenshot is taken from Simplicity Studio/Energy Profiler running session. It clearly indicates the system behavior and power consumption.
Power Profiler

This video clip shows the complete running, including power profiling process.

After much efforts, include working with Freescale FAE and applying some tricks listed in chip’s errata, the system consumes about 10uA while in sleep in VLPS mode.

Altera MAX10 DECA Board Fun Projects – Part 1 Graphics

September 22, 2015

DECA, a new Altera MAX10 FPGA evaluation board from Arrow/Terasic, is the most versatile low cost FPGA board I have.
I’m planning to write a series covering some projects for DECA, and the graphics is the first topic.
I’m using Quartus 15.0 for Linux Update 2 for the projects.

Here’s some details about the DECA.
Currently it sells for $169. Arrow holds a series of global workshops covering DECA since May 2015.
The small form factor FPGA board packs with features:

  • High end MAX10 device with 50,000 LEs, and pin count 22×22
  • 4 PLLs
  • Dual ADCs
  • 512MB DDR3
  • 64MB Flash
  • 10/100 Ethernet
  • USB2.0 OTG
  • SDHC
  • HDMI v1.4 Transmitter
  • MIPI CSI-2
  • 24bit Audio
  • 2 SMA inputs
      Various sensors

    • Gesture, Proximity/Ambient light
    • Humidity and temperature
    • Power monitor
    • Accelerometer
  • BeagleBone I/O expansion headers

The board I have is an evaluation kit comes with a wireless WiFi/BLE BeagleBone Cape and a 8M camera module. In my opinion, BeagleBone I/O expansion makes many kinds of interesting DIY projects possible.

The graphics for MAX10 is implemented in VIP (Altera Video and Image Processing Suite IPs).
The following block diagram shows the FPGA setup and software processes for graphics rendering.
DECA VIP Graphics
There are some challenges, the tough hardware and software ones are listed below:

  • Choosing proper parameters for VIP components to generate proper video timing signals
  • Low graphics rendering frame rate. In 1024×768 resolution, HMI application’s rendering performance is in the range of couple frames per second.

Hardware Challenges

To address the video timing issue (i.e., pixels drawn to the screen are not properly displayed), different parameters are tried to configure IP cores. This design uses DDR3 controller with 64 bit Avalon interface data width. The Frame Reader IP (VFR), which converts user graphics/image/video data stored in external memory to a video stream understood by VIP, should have the same master port data width as the DDR3 controller. According to the VIP user guide, VFR’s control register “Frame Words” should be “The number of words (reads from the master port) to read from memory for the frame”.
VFR Registers
To calculate the “Frame Words”, VFR pixel counts is used as the dividend and 2 (64/32) as the divisor. This configuration worked for one resolution, but not in other resolutions. Different VFR’s master port widths were tried (32 and 128), but had the same result. After days of struggles, I finally settled down the proper parameter combination working with all 640×480, 800×480 and 1024×768 resolutions.

IP Parameter Value
DDR3 Controller Avalon Interface Data Width 64
Frame Reader Master Port Width 256
Color Plane Sequencer Halve Control Packet Width disabled

Other key things I learned from hardware perspective:

  • With the current configuration, it can only have up to two VFRs in the design, or Quartus Analysis & Synthesis, namely quarts_map, will crash. I believe the reason is the hardware resource limitation from the parts used in DECA
  • To get correct DDR3 pin assignments and interconnects, run deca_vip_mem_if_ddr3_emif_p0_pin_assignments.tcl at least once, after Analysis & Synthesis but before Fitter (Place & Route) compilation. The reminding notice is buried in synthesis compiling messages.

Software Challenges

The graphics software demonstration is done in uCOS II environment, it has 3 tasks. The main task performs HMI (Human-Machine Interface) displaying 640×480 automotive glass cluster.

HMI GlassCluster

The other rendering task draws two vertical and horizontal bars in 800×480 screen. The third task moves the main HMI screen.
When the HMI screen is in 1024×768 resolution, the rendering frame rate is in the range of 1 to 2 fps. If the resolution is reduced to 640×480, fps improves but is still slow. Further investigation indicates the performance issue lies within HMI code itself, not necessary due to VIP or graphics implementation.
HMI code used is based on Altia’s proprietary framework.
In theory, the graphics hardware has enough bandwidth to drive a 640×480 32bit/pixel screen approaching 30 fps.
The following NIOS II Eclipse screenshot shows assembly and C code snippet of a pixel filling routine.
Pixel Filling
The Nios II Gen 2 softcore is clocked by a 100MHz PLL. Assuming in a best scenario, there is no CPU pipeline stall, no RTOS context switch, and one clock cycle per instruction.
The inner loop (j loop) takes:
2 + ( 2 + ( 5 + 6 ) * inner_loop_count ) cycles.
The outer loop (i loop) takes:
( 2 + ( inner_loop_cycles + 7 + 6 ) * outer_loop_count ) cycles
Therefore to update 640×480 pixels screen, cycle total needed is 3,388,320, or it takes about 33ms or in 30 fps. This is a simplified case, the actual frame rate depends on rendering content and will be of course much lower.
VFR supports two frame buffers, and the feature is very useful for double buffering. While displaying the front buffer, you can build scene in the back buffer and swap the buffers when the back buffer building is done. Although it won’t improve rendering performance, the technique effectively eliminates video flickering. The demo code for the drawing task uses the approach.

The hardware design, software demo code and HMI library, plus prebuilt FPGA bitstream and application binary are available at:
Note: I don’t have commercial VIP license, the prebuilt bitstream is time-limited.
You may need turn off and then on the HDMI monitor for the monitor to pick up the video signal (at least for the Sceptre monitor I’m using).

This video clip shows the complete process to configure FPGA, download and run graphics demo from command line, and debug the code from Eclipse for Nios.

The graphics design is now at Altera Design Store as
DECA Graphics Design Example

Graphics on Cyclone V SoCKIT FPGA

August 17, 2015

While exploring project ideas for DECA, a new Altera MAX10 FPGA board from Arrow/Terasic (I will cover it in a later series), I’m curious about FPGA graphics capability. I checked it out with a Cyclone V SoCKIT I had since last year.

I found Terasic SoCKIT VIP (Altera Video and Image Processing Suite) reference design from RocketBoards. It does a really good job to demonstrate video features of VIP. I modified the FPGA design to address more on graphics, by removing Logo generator I don’t need, and adding a frame reader of 800×480 pixels and 32 bit per pixel to support graphics, and reusing the first frame reader of 640×480 pixels for graphics too.

The following picture shows the FPGA hardware setup, basically VIP, and software process for graphics rendering.
FPGA Hardware Setup and Software Process

The rendering process consists of two threads handing graphics; and each thread corresponds a VIP Frame Reader, as indicated on the right side of FPGA setup of above picture.

  • The first 640×480 pixel graphics loops through a sequence of PNG files to show a snowfall. Except for transferring decoded PNG data directly into the frame buffer, it doesn’t do much else. The main parent process also adjusts this graphics’ position such that it bounces around within the screen.
  • The second 800×480 pixel graphics renders a Automotive HMI design. The graphics is using XFree86 frame buffer pipeline and proprietary engine.

The following video click shows the process showing the graphics: turn on the SoCKIT board; bring up ethernet so software application can be downloaded to FPAG and debugged via ARM DS-5 Altera Edition; configure the FPGA; and finally run the graphics application.

This exercise lays a solid foundation for my next tackling to get graphics working on Altera MAX 10 DECA board.

SC13 Review – Big Data and Exascale

December 23, 2013

SC13 SC13, The International Conference for High Performance Computing, Networking, Storage and Analysis 2013, was over last month. This year’s conference is SC’s 25th one held in Denver, Colorado, during Nov 16-22 2013. There were over 360 exhibitors, and more than 10,000 attendees coming from around the world.

As a first time SC attendee, I was overwhelmed and I’m still absorbing what I saw, heard and learned over this busy and tired week.

I attended Technical Program and Workshops sessions. There were many topics I’m interested at, here’s what my schedule looks like, and I made it most of the sessions.

My Schedule

My Schedule

The First day – Sunday 11/17
My first day Nov 17, Sunday was a little confused day. I went to the very first plenary session, but was turned away from other later sessions. I thought my pass would entitle me to other “HPC Educators” events, but it didn’t. I had to pay extra $100 for a 3 days’ Workshop access, on top of $550 Trellis-Logic paid for me for the regular Technical Program. I’m glad I did though, able to join Sunday, Monday and Friday’s workshops, in my opinion, they are the most informative and thorough sessions.
The plenary, “Making Parallelism Easy: A 25 Year Odyssey”, was presented by Kunle Olukotun, professor of EE & CS at Stanford University. His talk traced back his involvement in HPC and parallel programming since 1986. In 2000, Profession Olukotun founded Afara Websystems, the company was acquired by Sun Microsystems in 2002. Afara’s multicore SPARC-based processors became the foundation of Sun’s UltraSPARC T1.
In the afternoon, I attended Broader Engagement sessions “Graphics and Visualization Technologies”.
In “Visualizing in the Information Age Gaining Insight Against Insurmountable Odds”, Kelly Gaither, director of visualization and senior research scientist of Texas Advanced Computing Center, talked about why visualization provides such a powerful medium for analyzing, understanding and communicating the vast amounts of data being generated every day.

Rasmus Tamstorf, senior research scientist at Walt Disney Animation Studios, talked in “Using Supercomputers to Create Magic : HPC in Animation” about the unique challenges animation provides due to the diversity of workloads. He showed us some clips of the new animation movie “Frozen”. The movie has about 122,400 frames, and the slowest frames need 5 day, 12 hours and 13 minutes to render in a 932 Gb/s bandwidth, 10 TB RAM cache HPC.

The Second Day – Monday 11/18
The second day on Nov 18 Monday was fun. I went off waiting list to have a tour to visit the new NCAR-Wyoming Supercomputing Center.
NWSC started operation in Fall 2012. It was awarded LEED Gold certification for its exceptionally sustainable design and construction by U.S. Green Building Council.


The inaugural computing resource at NWSC is Yellowstone, a 1.5-petaflops IBM iDataPlex cluster with 72,576 processor cores (2.6-GHz Intel Xeon E5-2670), 4,536 nodes (IBM dx360 M4, 2 sockets, 16 cores), and 145.2 TB system memory (32 GB DDR3-1600 per node). It is integrated with a 10.9 PB , 90 Gbps bandwidth GPFS file system and data storage system. Interconnect uses InfiniBand 31.7 TBps bidirectional bisection bandwidth.

The following picture shows Yellowstone’s capacity compared to other NSF HPC.

Performance Portfolio

HPC Portfolio

Here’re some pictures I took from the facility.


Cluster Floor

Cooling Pipes

Mechanical Floor Cooling Pipes – 45º Bend

After coming back from NWSC, I went to the 6th Workshop on HPC Finance. I was eager to know more about HFT (high-frequency trading), in particular in terms of low latency and high bandwidth FPGA which I’m interested at the most, but unfortunately there was none. The sessions I attended were mostly about AT (algorithm trading) and ATS (automated trading systems).

The party, Exhibits Gala Opening Reception, started at 7:00 PM. I had drinks and grabbed some freebies; got home after 10:00 PM.

The Third Day Tuesday 11/19
This was the formal opening day of the conference. The keynote speaker Genevieve Bell is an Intel fellow in charge of Intel’s Interaction and Experience Research. Her elegant Australian accent and inspiring speech set up the tone of the conference: Big Data.

Genevieve Bell's Keynote

Dr. Genevieve Bell’s Keynote

In the keynote speech titled “The Secret Life of Data”, Dr. Bell, as a cultural anthropologist, presented an interesting history of Big Data has been with us for millennia, in forms such as census information collected over more than a thousand years ago, from anthropological perspective, and how Big Data benefits human society ever since. William I (1028 – 1087), usually known as William the Conqueror, ordered the compilation of the Domesday Book, a survey listing all the landholders in England along with their holdings, in 1086, one year before his death. One of the main purpose of the survey was to determine who held what and what taxes had been liable; the judgement of the Domesday assessors was final – whatever the book said about who held the material wealth or what it was worth was the law, and there was no appeal.
Domesday names a total of 13,418 places. The importance of Domesday Book for understanding the period in which it was written is difficult to overstate. Anyone who uses it “can have nothing but admiration for what is the oldest ‘public record’ in England and probably the most remarkable statistical document in the history of Europe. The most recent legal case referenced the Domesday Book was in 20th century.

I visited Tianhe-2 and SCCAS (Supercomputing Center of Computer Network Information Center, Chinese Academy of Sciences) booths.
Tianhe-2 retained its position as the world’s No. 1 system with a performance of 33.86 petaflops/s on the Linpack benchmark, according to the 42nd edition of the twice-yearly TOP500 list of the world’s most powerful supercomputers, announced yesterday at SC13.
SCCAS includes 14 supercomputer centers across China, and has more than 3Pflops aggregated computing capacity, and more than 15 PB shared storage.

The latest TOP500 list indicates HP and IBM together account for 72% of systems. The EU committed $1.6 billion to exascale research, and to build an ARM-based system by 2020 at Barcelona Supercomputer Center. China is expected to produce two 100-petaflop systems, one built entirely from China-made chips and interconnects, by as early as 2015. “Chinese are two years ahead of the US”, according to an IDC analyst. But Chinese “are not ahead in terms of software, they are not ahead in terms of applications”, according to Jack Dongarra, the distinguished professor from University of Tennessee and ORNL, who received 2013 Ken Kennedy Award at SC13. Japan will build an exascale system by 2020.

Big Data, along with Exascale, was everywhere on the exhibit floor and in the meeting rooms.

The Fourth Day Wednesday 11/20
The fourth day started with an invited talk “Data, Computation, and the Fate of the Universe”, by Nobel Laureate Saul Perlmutter of UC Berkeley and LBNL. The talk reaches into the past to explain how integrating big data — and careful analysis — led to the discovery of the acceleration of the universe’s expansion.

I visited some reconfigurable computing vendors, Altera, Pico Computing, Alpha Data, Nallatech and BittWare etc. I think the the most important progresses are new HMC (Hybrid Memory Cube) technology from Micron, and OpenCL support on both Altera and Xilinx devices.
HMC is a new RAM technology uses 3D packaging of multiple memory dies. It has more data banks than DRAM of the same size. The memory controller is integrated into memory package as a separate logic die. The technology addresses memory bandwidth problem the conventional memory technologies are facing. With performance levels that break through the memory wall, Hybrid Memory Cube represents the key to extending network system performance to push through the challenges of new 100G and 400G infrastructure growth. HMC will also enable exascale CPU system performance growth for next generation HPC systems.

GUPS (Giga Updates Per Second) is one of the toughest performance measurements for memory systems because it demonstrates how fast the memory can respond to random access. The DDR3 to HMC GUPS comparison identifies the performance gap between the two technologies. A single DDR3 channel can achieve about 0.022 GUPS, a single HMC link while running at 10 Gb/s averages about 0.213 GUPS.
To put this in perspective, to fully exercise the HMC device, 4 FPGA devices are required keep up with the performance of the HMC (just under 0.9 GUPS). To get this same performance using independent DDR3-1600 would require nearly 40 channels and 10 processors.



HMCC, the HMC Consortium, is backed by Samsung, Micron, ARM, HP, Microsoft, Altera, and Xilinx.

Altera has a quit large booth, and a lot of demonstrations. Understandable, the main focus of Altera presence is HPC, and OpenCL in particular. C-to-gates and MATLAB-to-gates are promised by vendors for decades, I think this time it is real. I’m excited in this big software to hardware transition period.
Alter offers many presentations each day, either by Altera and several of its partners, including Acceleware (training), Nallatech (hardware and services), BittWare (hardware and services).

Altera Presentation

Altera Presentation

Acceleraware Presentation

Acceleraware Presentation

Nallatech Presentation

Nallatech Presentation

Altera also announced a new training course, Optimizing OpenCL for Alter FPGAs available in early 2014, along with Parallel Computing with OpenCL Workshop which was offered since late 2012 and I was sent to by Trellis-Logic.

Xilinx is missing from the conference and exhibits. Does it still care about HPC?
Xilinx had a press release on Nov 18, 2013, Xilinx and Its Ecosystem Showcase Smarter Data Center Solutions Leveraging Vivado Design Suite and New OpenCL Flow at Supercomputing Conference 2013, so it seems it still cares but the press release was just rushed out for SC13 timing.
I was very disappointed to be turned away by Xilinx’s partners, including Alpha Data. I asked what can be shown for OpenCL running on Xilinx devices. I was told it would be ready next year, and let me visit OpenCL booth if I need more information. Hey, I was there wanting some more details, but unfortunately there was basically none, except knowing Xilinx’s working on it.

The Fifth Day Thursday 11/21
The plenary talk from Professor Alok Chouhdary from Northwestern University, titled as “Big Data + Big Compute = An Extreme Scale Marriage for Smarter Science?”, addresses the fundamental question “what are the challenges and opportunities for extreme scale systems to be an effective platform” for not only traditional simulations, but their suitability for data-intensive and data driven computing to accelerate time to insights.

There are many sessions covering OpenCL 2.0 announced in July, including Tutorials and Exhibitor Forums. I attended couple of them such as “OpenCL 2.0: Unlocking the Power of Your Heterogeneous Platform”. OpenCL 2.0 defines an enhanced execution model, adds shared virtual memory and dynamic parallelism. It supports a subset of memory model and synchronization of the current standards C11 and C++ 11.

OpenCL 2.0

OpenCL 2.0

OpenCL 2.0

OpenCL 2.0

In one session, I was surprised to hear the presenter was attacking nVidia. But it is understandable given the facts that nVidia dropped OpenCL support from CUDA SDK and it likes to go alone.

There are a lot of topics about OpenACC and OpenMP. The most important ones are Software for HPC I Exhibitor Forum “Announcing OpenMP API Version 4.0”, by Michael Wong, OpenMP current CEO, and BOF (Birds of a Feather) session “OpenMP Goes Heterogeneous With OpenMP 4.0”.
The new 4.0 API supports heterogeneous hardware accelerator, enhanced tasking model with groupings, thread affinity to support binding and to improve performance on non-uniform memory architectures, and SIMD support, etc. You can find more details and specifications from Here.

OpenMP 4.0

OpenMP 4.0

As for OpenACC, some opinions insist it is a nVidia show. There are some heated exchanges against to have proprietary closed source in the open standard. Check this out if you’re interested.

Most people would agree zSpace‘s 3D demo was the coolest one at SC13, if you saw it.

The immersive 3D interactive computing platform may bring engineering revolution in many fronts, “shaping the future of human-computer interaction and revolutionizing the way people learn, play, and create.”, as CEO of zSpace said when zSpace was nominated in the Computer Hardware & Components category as CES Innovation 2014 Design and Engineering Award.

The exhibit floor closed in the afternoon. I skipped the evening party, “Technical Program Conference Reception”, held in Denver Museum of Nature and Science. It was a cold and snowy night.

The Sixth Day Friday 11/22
The last conference day had only panels and workshops. I attended “1st International Workshop on Software Engineering for High Performance Computational Science and Engineering, then “From the Exascaleto the Sensor-Scale” panel.

    I participated in the workshop sessions:
  • High-Performance Design Patterns in Modern Fortran
  • Extract UML Class Diagrams from Object-Oriented Fortran: ForUML
  • A Pilot Study: Design Patterns in Parallel Program Development

Kathy Yelick, 2013-2014 Athena Lecture award winner, was among panelists for the panel.

    And here’re some topics covered:
  • Big Computing: From the Exa-Scale to the Sensor-Scale
  • Large Scale Sensor-Actuator Computing
  • Cyberinfrastructure: Setting the Stage for Seonsor-Scale, Exa-Scale and Dynamic Data

Final Words
I’m really glad to see and share how High Performance Computing has become integral to every aspect of modern life. From bioengineering to economics, consumer products to medical marvels, advanced science to public safety to entertainment and daily life, HPC is everywhere.

OSC (Ohio Supercomputer Center) hopes to open an app store for HPC software tool with AweSim by the second quarter of 2014. AweSim is a web-based HPC platform to help SMEs (small and medium sized enterprises) to solve big modeling and simulation problems. It would cost about $200 to $500 designing a manufacturing part, to run the simulation and package the results in a report.

SC14 will be held in New Orleans from Nov 16 to 21, 2014. See you there if I can make it.

New FPGA Kickstarter Project

December 12, 2013

My friend Mike Jones and his pals just launched a kickstarted project today, 12/11/2013:
LOGi FPGA Development Board for Raspberry Pi – Beaglebone

The community oriented LOGi FPGA boards are powered by cost-effective Xilinx Spartan-6 LX9 devices.
There are two FPGA boards, LOGi-Pi and LOGi-Bone, plus some expansion add-ons, such as LOGi-EDU with joystick and many different peripherals, and LOGi-Cam with 640×480 camera module.

The LOGi-Pi can be used as a shield with Raspberry Pi. It has 4 PMODs.



The LOGi-Bone can be used as a cape with Beaglebone White or Blank. It has 2 PMODs.



You can use Xilinx’s free version of tools to do FPGA designs. The good news is, you don’t need JTAG programmer to unload your designs to FPGA. C-based loaders are provided. It is fast and easy. I’m glad to say I did contribute in this part of the project.

Go get them before they are all gone. Early adopter can get it only in $69 for one LOGi-Pi or LOGi-Bone. You’ll still need your own Pi or Bone.

A Supercomputer for only $100 – The Kickstart project Parallella

October 26, 2012

Check out Kickstarter project Parallella

Parallella, a personal supercomputer, is using Zynq 7010 FPGA SoC as the host, and Epiphany 16- or 64-core Microprocessor as the accelerator.
Adapteva, owner of the Epiphany technology, already has a SDK, and OpenCL SDK (beta). Parallella would be very useful for embedded vision, SDR, HPC and many other computation intensive projects.

The following diagram shows Parallella specification.

The Kickstarter goal is $750,000, and it ends on 2012-10-27. At the time of writing this post (2012-10-25, 23:33 MDT), there are 3,325 backers, and $578,542 pledged.
I pledged $99, and if you’re interested, please do the same. There are less than 2 days left to make it a success.

I got ZED board

September 5, 2012

I got the ZED board I ordered 3 months ago today.

So far so good, hope I can find time doing some interesting projects with ZED and KC705.


The following command is used to turn on LED LD0:
echo 1 > /sys/class/gpio/gpio61/value

Vivado is available

July 26, 2012

If you are a follower of XilinxInc, you may find
Vivado 2012.2, and ISE 14.2, are available today at Xilinx Downloads link.

I hope I have time to report my experience, in particular for HLS, with the new design tool

Peripheral Modules from Digilent and Maxim

July 12, 2012

For those who own or will own Avnet LX9 MicroBoard, Digilent boards, ZedBoard, and Xilinx ZC702, or any other FPGA boards with Pmod ports, Pmod with analog and mixed-signal functions is a must have to pursue interesting and fun projects.

Pmod™ is the trademark of Digilent. Pmods are 6 or 12 pin small I/O boards to extend FPGA boards.
Digilent sells many peripheral modules, almost 50 ones in total, price ranging from $9.99, such as slide switches, LEDs, and RCA audio jacks, to $59.99, a 802.11b WiFi. They includes input/output, sensors/actuators, DAC/ADC, and connectors, etc.

Maxim announced a collection of 15 Pmods, called Maxim Analog Essentials Collection, early this month.
The $89.95 collections has the following Pmods:
Octal 12-Bit DAC
16-Bit ADC
16-Bit High-Performance DAC
RS-485 Half-Duplex Transceiver
8-Channel Relay Driver
All-Silicon Clock Oscillator
600V Isolator for SPI/UART
Proximity and Ambient Light Sensor
±5ppm I2C Real-Time Clock
Digital Thermometer/Thermostat
Type K Thermocouple-to-Digital Converter
Dual Nonvolatile Digital Pot
Programmable Current Limiter and ADC
16-Port GPIO and LED Driver
±15kV True RS-232 Transceiver

MAXPMBAE – Maxim Analog Essentials Collection of Peripheral Modules

Here‘re some more details, and Product Brochure.

EDN has a product review Maxim’s collection of peripheral modules for FPGAs, it covers Maxim’s offering, and some useful information about Pmod.

I will order one for sure, it is well worth it, considering single Pmod could cost around $20.
I hope and believe more reasonably priced peripheral modules, with diverse functionalities, will show up.
FMC is more powerful but expensive, it may be out of reach for many.

If you have any ideas for a fun FPGA project and don’t mind to share, please do.