SC13 Review – Big Data and Exascale

December 23, 2013

SC13 SC13, The International Conference for High Performance Computing, Networking, Storage and Analysis 2013, was over last month. This year’s conference is SC’s 25th one held in Denver, Colorado, during Nov 16-22 2013. There were over 360 exhibitors, and more than 10,000 attendees coming from around the world.

As a first time SC attendee, I was overwhelmed and I’m still absorbing what I saw, heard and learned over this busy and tired week.

I attended Technical Program and Workshops sessions. There were many topics I’m interested at, here’s what my schedule looks like, and I made it most of the sessions.

My Schedule

My Schedule

The First day – Sunday 11/17
My first day Nov 17, Sunday was a little confused day. I went to the very first plenary session, but was turned away from other later sessions. I thought my pass would entitle me to other “HPC Educators” events, but it didn’t. I had to pay extra $100 for a 3 days’ Workshop access, on top of $550 Trellis-Logic paid for me for the regular Technical Program. I’m glad I did though, able to join Sunday, Monday and Friday’s workshops, in my opinion, they are the most informative and thorough sessions.
The plenary, “Making Parallelism Easy: A 25 Year Odyssey”, was presented by Kunle Olukotun, professor of EE & CS at Stanford University. His talk traced back his involvement in HPC and parallel programming since 1986. In 2000, Profession Olukotun founded Afara Websystems, the company was acquired by Sun Microsystems in 2002. Afara’s multicore SPARC-based processors became the foundation of Sun’s UltraSPARC T1.
In the afternoon, I attended Broader Engagement sessions “Graphics and Visualization Technologies”.
In “Visualizing in the Information Age Gaining Insight Against Insurmountable Odds”, Kelly Gaither, director of visualization and senior research scientist of Texas Advanced Computing Center, talked about why visualization provides such a powerful medium for analyzing, understanding and communicating the vast amounts of data being generated every day.

Rasmus Tamstorf, senior research scientist at Walt Disney Animation Studios, talked in “Using Supercomputers to Create Magic : HPC in Animation” about the unique challenges animation provides due to the diversity of workloads. He showed us some clips of the new animation movie “Frozen”. The movie has about 122,400 frames, and the slowest frames need 5 day, 12 hours and 13 minutes to render in a 932 Gb/s bandwidth, 10 TB RAM cache HPC.

The Second Day – Monday 11/18
The second day on Nov 18 Monday was fun. I went off waiting list to have a tour to visit the new NCAR-Wyoming Supercomputing Center.
NWSC started operation in Fall 2012. It was awarded LEED Gold certification for its exceptionally sustainable design and construction by U.S. Green Building Council.


The inaugural computing resource at NWSC is Yellowstone, a 1.5-petaflops IBM iDataPlex cluster with 72,576 processor cores (2.6-GHz Intel Xeon E5-2670), 4,536 nodes (IBM dx360 M4, 2 sockets, 16 cores), and 145.2 TB system memory (32 GB DDR3-1600 per node). It is integrated with a 10.9 PB , 90 Gbps bandwidth GPFS file system and data storage system. Interconnect uses InfiniBand 31.7 TBps bidirectional bisection bandwidth.

The following picture shows Yellowstone’s capacity compared to other NSF HPC.

Performance Portfolio

HPC Portfolio

Here’re some pictures I took from the facility.


Cluster Floor

Cooling Pipes

Mechanical Floor Cooling Pipes – 45º Bend

After coming back from NWSC, I went to the 6th Workshop on HPC Finance. I was eager to know more about HFT (high-frequency trading), in particular in terms of low latency and high bandwidth FPGA which I’m interested at the most, but unfortunately there was none. The sessions I attended were mostly about AT (algorithm trading) and ATS (automated trading systems).

The party, Exhibits Gala Opening Reception, started at 7:00 PM. I had drinks and grabbed some freebies; got home after 10:00 PM.

The Third Day Tuesday 11/19
This was the formal opening day of the conference. The keynote speaker Genevieve Bell is an Intel fellow in charge of Intel’s Interaction and Experience Research. Her elegant Australian accent and inspiring speech set up the tone of the conference: Big Data.

Genevieve Bell's Keynote

Dr. Genevieve Bell’s Keynote

In the keynote speech titled “The Secret Life of Data”, Dr. Bell, as a cultural anthropologist, presented an interesting history of Big Data has been with us for millennia, in forms such as census information collected over more than a thousand years ago, from anthropological perspective, and how Big Data benefits human society ever since. William I (1028 – 1087), usually known as William the Conqueror, ordered the compilation of the Domesday Book, a survey listing all the landholders in England along with their holdings, in 1086, one year before his death. One of the main purpose of the survey was to determine who held what and what taxes had been liable; the judgement of the Domesday assessors was final – whatever the book said about who held the material wealth or what it was worth was the law, and there was no appeal.
Domesday names a total of 13,418 places. The importance of Domesday Book for understanding the period in which it was written is difficult to overstate. Anyone who uses it “can have nothing but admiration for what is the oldest ‘public record’ in England and probably the most remarkable statistical document in the history of Europe. The most recent legal case referenced the Domesday Book was in 20th century.

I visited Tianhe-2 and SCCAS (Supercomputing Center of Computer Network Information Center, Chinese Academy of Sciences) booths.
Tianhe-2 retained its position as the world’s No. 1 system with a performance of 33.86 petaflops/s on the Linpack benchmark, according to the 42nd edition of the twice-yearly TOP500 list of the world’s most powerful supercomputers, announced yesterday at SC13.
SCCAS includes 14 supercomputer centers across China, and has more than 3Pflops aggregated computing capacity, and more than 15 PB shared storage.

The latest TOP500 list indicates HP and IBM together account for 72% of systems. The EU committed $1.6 billion to exascale research, and to build an ARM-based system by 2020 at Barcelona Supercomputer Center. China is expected to produce two 100-petaflop systems, one built entirely from China-made chips and interconnects, by as early as 2015. “Chinese are two years ahead of the US”, according to an IDC analyst. But Chinese “are not ahead in terms of software, they are not ahead in terms of applications”, according to Jack Dongarra, the distinguished professor from University of Tennessee and ORNL, who received 2013 Ken Kennedy Award at SC13. Japan will build an exascale system by 2020.

Big Data, along with Exascale, was everywhere on the exhibit floor and in the meeting rooms.

The Fourth Day Wednesday 11/20
The fourth day started with an invited talk “Data, Computation, and the Fate of the Universe”, by Nobel Laureate Saul Perlmutter of UC Berkeley and LBNL. The talk reaches into the past to explain how integrating big data — and careful analysis — led to the discovery of the acceleration of the universe’s expansion.

I visited some reconfigurable computing vendors, Altera, Pico Computing, Alpha Data, Nallatech and BittWare etc. I think the the most important progresses are new HMC (Hybrid Memory Cube) technology from Micron, and OpenCL support on both Altera and Xilinx devices.
HMC is a new RAM technology uses 3D packaging of multiple memory dies. It has more data banks than DRAM of the same size. The memory controller is integrated into memory package as a separate logic die. The technology addresses memory bandwidth problem the conventional memory technologies are facing. With performance levels that break through the memory wall, Hybrid Memory Cube represents the key to extending network system performance to push through the challenges of new 100G and 400G infrastructure growth. HMC will also enable exascale CPU system performance growth for next generation HPC systems.

GUPS (Giga Updates Per Second) is one of the toughest performance measurements for memory systems because it demonstrates how fast the memory can respond to random access. The DDR3 to HMC GUPS comparison identifies the performance gap between the two technologies. A single DDR3 channel can achieve about 0.022 GUPS, a single HMC link while running at 10 Gb/s averages about 0.213 GUPS.
To put this in perspective, to fully exercise the HMC device, 4 FPGA devices are required keep up with the performance of the HMC (just under 0.9 GUPS). To get this same performance using independent DDR3-1600 would require nearly 40 channels and 10 processors.



HMCC, the HMC Consortium, is backed by Samsung, Micron, ARM, HP, Microsoft, Altera, and Xilinx.

Altera has a quit large booth, and a lot of demonstrations. Understandable, the main focus of Altera presence is HPC, and OpenCL in particular. C-to-gates and MATLAB-to-gates are promised by vendors for decades, I think this time it is real. I’m excited in this big software to hardware transition period.
Alter offers many presentations each day, either by Altera and several of its partners, including Acceleware (training), Nallatech (hardware and services), BittWare (hardware and services).

Altera Presentation

Altera Presentation

Acceleraware Presentation

Acceleraware Presentation

Nallatech Presentation

Nallatech Presentation

Altera also announced a new training course, Optimizing OpenCL for Alter FPGAs available in early 2014, along with Parallel Computing with OpenCL Workshop which was offered since late 2012 and I was sent to by Trellis-Logic.

Xilinx is missing from the conference and exhibits. Does it still care about HPC?
Xilinx had a press release on Nov 18, 2013, Xilinx and Its Ecosystem Showcase Smarter Data Center Solutions Leveraging Vivado Design Suite and New OpenCL Flow at Supercomputing Conference 2013, so it seems it still cares but the press release was just rushed out for SC13 timing.
I was very disappointed to be turned away by Xilinx’s partners, including Alpha Data. I asked what can be shown for OpenCL running on Xilinx devices. I was told it would be ready next year, and let me visit OpenCL booth if I need more information. Hey, I was there wanting some more details, but unfortunately there was basically none, except knowing Xilinx’s working on it.

The Fifth Day Thursday 11/21
The plenary talk from Professor Alok Chouhdary from Northwestern University, titled as “Big Data + Big Compute = An Extreme Scale Marriage for Smarter Science?”, addresses the fundamental question “what are the challenges and opportunities for extreme scale systems to be an effective platform” for not only traditional simulations, but their suitability for data-intensive and data driven computing to accelerate time to insights.

There are many sessions covering OpenCL 2.0 announced in July, including Tutorials and Exhibitor Forums. I attended couple of them such as “OpenCL 2.0: Unlocking the Power of Your Heterogeneous Platform”. OpenCL 2.0 defines an enhanced execution model, adds shared virtual memory and dynamic parallelism. It supports a subset of memory model and synchronization of the current standards C11 and C++ 11.

OpenCL 2.0

OpenCL 2.0

OpenCL 2.0

OpenCL 2.0

In one session, I was surprised to hear the presenter was attacking nVidia. But it is understandable given the facts that nVidia dropped OpenCL support from CUDA SDK and it likes to go alone.

There are a lot of topics about OpenACC and OpenMP. The most important ones are Software for HPC I Exhibitor Forum “Announcing OpenMP API Version 4.0″, by Michael Wong, OpenMP current CEO, and BOF (Birds of a Feather) session “OpenMP Goes Heterogeneous With OpenMP 4.0″.
The new 4.0 API supports heterogeneous hardware accelerator, enhanced tasking model with groupings, thread affinity to support binding and to improve performance on non-uniform memory architectures, and SIMD support, etc. You can find more details and specifications from Here.

OpenMP 4.0

OpenMP 4.0

As for OpenACC, some opinions insist it is a nVidia show. There are some heated exchanges against to have proprietary closed source in the open standard. Check this out if you’re interested.

Most people would agree zSpace‘s 3D demo was the coolest one at SC13, if you saw it.

The immersive 3D interactive computing platform may bring engineering revolution in many fronts, “shaping the future of human-computer interaction and revolutionizing the way people learn, play, and create.”, as CEO of zSpace said when zSpace was nominated in the Computer Hardware & Components category as CES Innovation 2014 Design and Engineering Award.

The exhibit floor closed in the afternoon. I skipped the evening party, “Technical Program Conference Reception”, held in Denver Museum of Nature and Science. It was a cold and snowy night.

The Sixth Day Friday 11/22
The last conference day had only panels and workshops. I attended “1st International Workshop on Software Engineering for High Performance Computational Science and Engineering, then “From the Exascaleto the Sensor-Scale” panel.

    I participated in the workshop sessions:
  • High-Performance Design Patterns in Modern Fortran
  • Extract UML Class Diagrams from Object-Oriented Fortran: ForUML
  • A Pilot Study: Design Patterns in Parallel Program Development

Kathy Yelick, 2013-2014 Athena Lecture award winner, was among panelists for the panel.

    And here’re some topics covered:
  • Big Computing: From the Exa-Scale to the Sensor-Scale
  • Large Scale Sensor-Actuator Computing
  • Cyberinfrastructure: Setting the Stage for Seonsor-Scale, Exa-Scale and Dynamic Data

Final Words
I’m really glad to see and share how High Performance Computing has become integral to every aspect of modern life. From bioengineering to economics, consumer products to medical marvels, advanced science to public safety to entertainment and daily life, HPC is everywhere.

OSC (Ohio Supercomputer Center) hopes to open an app store for HPC software tool with AweSim by the second quarter of 2014. AweSim is a web-based HPC platform to help SMEs (small and medium sized enterprises) to solve big modeling and simulation problems. It would cost about $200 to $500 designing a manufacturing part, to run the simulation and package the results in a report.

SC14 will be held in New Orleans from Nov 16 to 21, 2014. See you there if I can make it.

New FPGA Kickstarter Project

December 12, 2013

My friend Mike Jones and his pals just launched a kickstarted project today, 12/11/2013:
LOGi FPGA Development Board for Raspberry Pi – Beaglebone

The community oriented LOGi FPGA boards are powered by cost-effective Xilinx Spartan-6 LX9 devices.
There are two FPGA boards, LOGi-Pi and LOGi-Bone, plus some expansion add-ons, such as LOGi-EDU with joystick and many different peripherals, and LOGi-Cam with 640×480 camera module.

The LOGi-Pi can be used as a shield with Raspberry Pi. It has 4 PMODs.



The LOGi-Bone can be used as a cape with Beaglebone White or Blank. It has 2 PMODs.



You can use Xilinx’s free version of tools to do FPGA designs. The good news is, you don’t need JTAG programmer to unload your designs to FPGA. C-based loaders are provided. It is fast and easy. I’m glad to say I did contribute in this part of the project.

Go get them before they are all gone. Early adopter can get it only in $69 for one LOGi-Pi or LOGi-Bone. You’ll still need your own Pi or Bone.

A Supercomputer for only $100 – The Kickstart project Parallella

October 26, 2012

Check out Kickstarter project Parallella

Parallella, a personal supercomputer, is using Zynq 7010 FPGA SoC as the host, and Epiphany 16- or 64-core Microprocessor as the accelerator.
Adapteva, owner of the Epiphany technology, already has a SDK, and OpenCL SDK (beta). Parallella would be very useful for embedded vision, SDR, HPC and many other computation intensive projects.

The following diagram shows Parallella specification.

The Kickstarter goal is $750,000, and it ends on 2012-10-27. At the time of writing this post (2012-10-25, 23:33 MDT), there are 3,325 backers, and $578,542 pledged.
I pledged $99, and if you’re interested, please do the same. There are less than 2 days left to make it a success.

I got ZED board

September 5, 2012

I got the ZED board I ordered 3 months ago today.

So far so good, hope I can find time doing some interesting projects with ZED and KC705.


The following command is used to turn on LED LD0:
echo 1 > /sys/class/gpio/gpio61/value

Vivado is available

July 26, 2012

If you are a follower of XilinxInc, you may find
Vivado 2012.2, and ISE 14.2, are available today at Xilinx Downloads link.

I hope I have time to report my experience, in particular for HLS, with the new design tool

Peripheral Modules from Digilent and Maxim

July 12, 2012

For those who own or will own Avnet LX9 MicroBoard, Digilent boards, ZedBoard, and Xilinx ZC702, or any other FPGA boards with Pmod ports, Pmod with analog and mixed-signal functions is a must have to pursue interesting and fun projects.

Pmod™ is the trademark of Digilent. Pmods are 6 or 12 pin small I/O boards to extend FPGA boards.
Digilent sells many peripheral modules, almost 50 ones in total, price ranging from $9.99, such as slide switches, LEDs, and RCA audio jacks, to $59.99, a 802.11b WiFi. They includes input/output, sensors/actuators, DAC/ADC, and connectors, etc.

Maxim announced a collection of 15 Pmods, called Maxim Analog Essentials Collection, early this month.
The $89.95 collections has the following Pmods:
Octal 12-Bit DAC
16-Bit ADC
16-Bit High-Performance DAC
RS-485 Half-Duplex Transceiver
8-Channel Relay Driver
All-Silicon Clock Oscillator
600V Isolator for SPI/UART
Proximity and Ambient Light Sensor
±5ppm I2C Real-Time Clock
Digital Thermometer/Thermostat
Type K Thermocouple-to-Digital Converter
Dual Nonvolatile Digital Pot
Programmable Current Limiter and ADC
16-Port GPIO and LED Driver
±15kV True RS-232 Transceiver

MAXPMBAE – Maxim Analog Essentials Collection of Peripheral Modules

Here‘re some more details, and Product Brochure.

EDN has a product review Maxim’s collection of peripheral modules for FPGAs, it covers Maxim’s offering, and some useful information about Pmod.

I will order one for sure, it is well worth it, considering single Pmod could cost around $20.
I hope and believe more reasonably priced peripheral modules, with diverse functionalities, will show up.
FMC is more powerful but expensive, it may be out of reach for many.

If you have any ideas for a fun FPGA project and don’t mind to share, please do.


You can order ZedBoard now

June 1, 2012

ZedBoard web site is up with real content, and you can order the board from today, scheduled to ship in July.
Two versions of ZedBoard are offered (not sure if there are any hardware difference):

  • Avnet commercial version $395
  • Digilent academic edition $299 (university affiliation is required)


Waiting is over…

May 24, 2012

I got the new shining KC705 devkit today, which I ordered in early March.
I had mixed feelings, excited and a little bit disappointed.
Excited because I could pursue my interesting projects with the new gear.
Disappointed due to the facts that:

I will focus on KC705 after getting other projects out of my hands, probably in several weeks.


The KC705 is the larger board in the bottom. I put the smaller SP605 in the middle to compare.

Thanks for Reading!

May 23, 2012

Friends, visiting to the blog crosses 10,000 hits today.

It’s been a little over 2 years now since my first post. Thanks for your reading, I will try my best to update here my FPGA related activities.

10,000 Hits!

Vivado HLS in Action

May 18, 2012

Check out this clip for Vivado HLS 2012.2 in action:

Notice how easily it is to apply different HLS directives to the same C code to improve frame rate from 2fps to 81fps.
You can see the exploding potentials of FPGA in software engineers’ hands.


Get every new post delivered to your Inbox.