Archive for December, 2013

SC13 Review – Big Data and Exascale

December 23, 2013

SC13 SC13, The International Conference for High Performance Computing, Networking, Storage and Analysis 2013, was over last month. This year’s conference is SC’s 25th one held in Denver, Colorado, during Nov 16-22 2013. There were over 360 exhibitors, and more than 10,000 attendees coming from around the world.

As a first time SC attendee, I was overwhelmed and I’m still absorbing what I saw, heard and learned over this busy and tired week.

I attended Technical Program and Workshops sessions. There were many topics I’m interested at, here’s what my schedule looks like, and I made it most of the sessions.

My Schedule

My Schedule

The First day – Sunday 11/17
My first day Nov 17, Sunday was a little confused day. I went to the very first plenary session, but was turned away from other later sessions. I thought my pass would entitle me to other “HPC Educators” events, but it didn’t. I had to pay extra $100 for a 3 days’ Workshop access, on top of $550 Trellis-Logic paid for me for the regular Technical Program. I’m glad I did though, able to join Sunday, Monday and Friday’s workshops, in my opinion, they are the most informative and thorough sessions.
The plenary, “Making Parallelism Easy: A 25 Year Odyssey”, was presented by Kunle Olukotun, professor of EE & CS at Stanford University. His talk traced back his involvement in HPC and parallel programming since 1986. In 2000, Profession Olukotun founded Afara Websystems, the company was acquired by Sun Microsystems in 2002. Afara’s multicore SPARC-based processors became the foundation of Sun’s UltraSPARC T1.
In the afternoon, I attended Broader Engagement sessions “Graphics and Visualization Technologies”.
In “Visualizing in the Information Age Gaining Insight Against Insurmountable Odds”, Kelly Gaither, director of visualization and senior research scientist of Texas Advanced Computing Center, talked about why visualization provides such a powerful medium for analyzing, understanding and communicating the vast amounts of data being generated every day.

Rasmus Tamstorf, senior research scientist at Walt Disney Animation Studios, talked in “Using Supercomputers to Create Magic : HPC in Animation” about the unique challenges animation provides due to the diversity of workloads. He showed us some clips of the new animation movie “Frozen”. The movie has about 122,400 frames, and the slowest frames need 5 day, 12 hours and 13 minutes to render in a 932 Gb/s bandwidth, 10 TB RAM cache HPC.

The Second Day – Monday 11/18
The second day on Nov 18 Monday was fun. I went off waiting list to have a tour to visit the new NCAR-Wyoming Supercomputing Center.
NWSC started operation in Fall 2012. It was awarded LEED Gold certification for its exceptionally sustainable design and construction by U.S. Green Building Council.


The inaugural computing resource at NWSC is Yellowstone, a 1.5-petaflops IBM iDataPlex cluster with 72,576 processor cores (2.6-GHz Intel Xeon E5-2670), 4,536 nodes (IBM dx360 M4, 2 sockets, 16 cores), and 145.2 TB system memory (32 GB DDR3-1600 per node). It is integrated with a 10.9 PB , 90 Gbps bandwidth GPFS file system and data storage system. Interconnect uses InfiniBand 31.7 TBps bidirectional bisection bandwidth.

The following picture shows Yellowstone’s capacity compared to other NSF HPC.

Performance Portfolio

HPC Portfolio

Here’re some pictures I took from the facility.


Cluster Floor

Cooling Pipes

Mechanical Floor Cooling Pipes – 45º Bend

After coming back from NWSC, I went to the 6th Workshop on HPC Finance. I was eager to know more about HFT (high-frequency trading), in particular in terms of low latency and high bandwidth FPGA which I’m interested at the most, but unfortunately there was none. The sessions I attended were mostly about AT (algorithm trading) and ATS (automated trading systems).

The party, Exhibits Gala Opening Reception, started at 7:00 PM. I had drinks and grabbed some freebies; got home after 10:00 PM.

The Third Day Tuesday 11/19
This was the formal opening day of the conference. The keynote speaker Genevieve Bell is an Intel fellow in charge of Intel’s Interaction and Experience Research. Her elegant Australian accent and inspiring speech set up the tone of the conference: Big Data.

Genevieve Bell's Keynote

Dr. Genevieve Bell’s Keynote

In the keynote speech titled “The Secret Life of Data”, Dr. Bell, as a cultural anthropologist, presented an interesting history of Big Data has been with us for millennia, in forms such as census information collected over more than a thousand years ago, from anthropological perspective, and how Big Data benefits human society ever since. William I (1028 – 1087), usually known as William the Conqueror, ordered the compilation of the Domesday Book, a survey listing all the landholders in England along with their holdings, in 1086, one year before his death. One of the main purpose of the survey was to determine who held what and what taxes had been liable; the judgement of the Domesday assessors was final – whatever the book said about who held the material wealth or what it was worth was the law, and there was no appeal.
Domesday names a total of 13,418 places. The importance of Domesday Book for understanding the period in which it was written is difficult to overstate. Anyone who uses it “can have nothing but admiration for what is the oldest ‘public record’ in England and probably the most remarkable statistical document in the history of Europe. The most recent legal case referenced the Domesday Book was in 20th century.

I visited Tianhe-2 and SCCAS (Supercomputing Center of Computer Network Information Center, Chinese Academy of Sciences) booths.
Tianhe-2 retained its position as the world’s No. 1 system with a performance of 33.86 petaflops/s on the Linpack benchmark, according to the 42nd edition of the twice-yearly TOP500 list of the world’s most powerful supercomputers, announced yesterday at SC13.
SCCAS includes 14 supercomputer centers across China, and has more than 3Pflops aggregated computing capacity, and more than 15 PB shared storage.

The latest TOP500 list indicates HP and IBM together account for 72% of systems. The EU committed $1.6 billion to exascale research, and to build an ARM-based system by 2020 at Barcelona Supercomputer Center. China is expected to produce two 100-petaflop systems, one built entirely from China-made chips and interconnects, by as early as 2015. “Chinese are two years ahead of the US”, according to an IDC analyst. But Chinese “are not ahead in terms of software, they are not ahead in terms of applications”, according to Jack Dongarra, the distinguished professor from University of Tennessee and ORNL, who received 2013 Ken Kennedy Award at SC13. Japan will build an exascale system by 2020.

Big Data, along with Exascale, was everywhere on the exhibit floor and in the meeting rooms.

The Fourth Day Wednesday 11/20
The fourth day started with an invited talk “Data, Computation, and the Fate of the Universe”, by Nobel Laureate Saul Perlmutter of UC Berkeley and LBNL. The talk reaches into the past to explain how integrating big data — and careful analysis — led to the discovery of the acceleration of the universe’s expansion.

I visited some reconfigurable computing vendors, Altera, Pico Computing, Alpha Data, Nallatech and BittWare etc. I think the the most important progresses are new HMC (Hybrid Memory Cube) technology from Micron, and OpenCL support on both Altera and Xilinx devices.
HMC is a new RAM technology uses 3D packaging of multiple memory dies. It has more data banks than DRAM of the same size. The memory controller is integrated into memory package as a separate logic die. The technology addresses memory bandwidth problem the conventional memory technologies are facing. With performance levels that break through the memory wall, Hybrid Memory Cube represents the key to extending network system performance to push through the challenges of new 100G and 400G infrastructure growth. HMC will also enable exascale CPU system performance growth for next generation HPC systems.

GUPS (Giga Updates Per Second) is one of the toughest performance measurements for memory systems because it demonstrates how fast the memory can respond to random access. The DDR3 to HMC GUPS comparison identifies the performance gap between the two technologies. A single DDR3 channel can achieve about 0.022 GUPS, a single HMC link while running at 10 Gb/s averages about 0.213 GUPS.
To put this in perspective, to fully exercise the HMC device, 4 FPGA devices are required keep up with the performance of the HMC (just under 0.9 GUPS). To get this same performance using independent DDR3-1600 would require nearly 40 channels and 10 processors.



HMCC, the HMC Consortium, is backed by Samsung, Micron, ARM, HP, Microsoft, Altera, and Xilinx.

Altera has a quit large booth, and a lot of demonstrations. Understandable, the main focus of Altera presence is HPC, and OpenCL in particular. C-to-gates and MATLAB-to-gates are promised by vendors for decades, I think this time it is real. I’m excited in this big software to hardware transition period.
Alter offers many presentations each day, either by Altera and several of its partners, including Acceleware (training), Nallatech (hardware and services), BittWare (hardware and services).

Altera Presentation

Altera Presentation

Acceleraware Presentation

Acceleraware Presentation

Nallatech Presentation

Nallatech Presentation

Altera also announced a new training course, Optimizing OpenCL for Alter FPGAs available in early 2014, along with Parallel Computing with OpenCL Workshop which was offered since late 2012 and I was sent to by Trellis-Logic.

Xilinx is missing from the conference and exhibits. Does it still care about HPC?
Xilinx had a press release on Nov 18, 2013, Xilinx and Its Ecosystem Showcase Smarter Data Center Solutions Leveraging Vivado Design Suite and New OpenCL Flow at Supercomputing Conference 2013, so it seems it still cares but the press release was just rushed out for SC13 timing.
I was very disappointed to be turned away by Xilinx’s partners, including Alpha Data. I asked what can be shown for OpenCL running on Xilinx devices. I was told it would be ready next year, and let me visit OpenCL booth if I need more information. Hey, I was there wanting some more details, but unfortunately there was basically none, except knowing Xilinx’s working on it.

The Fifth Day Thursday 11/21
The plenary talk from Professor Alok Chouhdary from Northwestern University, titled as “Big Data + Big Compute = An Extreme Scale Marriage for Smarter Science?”, addresses the fundamental question “what are the challenges and opportunities for extreme scale systems to be an effective platform” for not only traditional simulations, but their suitability for data-intensive and data driven computing to accelerate time to insights.

There are many sessions covering OpenCL 2.0 announced in July, including Tutorials and Exhibitor Forums. I attended couple of them such as “OpenCL 2.0: Unlocking the Power of Your Heterogeneous Platform”. OpenCL 2.0 defines an enhanced execution model, adds shared virtual memory and dynamic parallelism. It supports a subset of memory model and synchronization of the current standards C11 and C++ 11.

OpenCL 2.0

OpenCL 2.0

OpenCL 2.0

OpenCL 2.0

In one session, I was surprised to hear the presenter was attacking nVidia. But it is understandable given the facts that nVidia dropped OpenCL support from CUDA SDK and it likes to go alone.

There are a lot of topics about OpenACC and OpenMP. The most important ones are Software for HPC I Exhibitor Forum “Announcing OpenMP API Version 4.0”, by Michael Wong, OpenMP current CEO, and BOF (Birds of a Feather) session “OpenMP Goes Heterogeneous With OpenMP 4.0”.
The new 4.0 API supports heterogeneous hardware accelerator, enhanced tasking model with groupings, thread affinity to support binding and to improve performance on non-uniform memory architectures, and SIMD support, etc. You can find more details and specifications from Here.

OpenMP 4.0

OpenMP 4.0

As for OpenACC, some opinions insist it is a nVidia show. There are some heated exchanges against to have proprietary closed source in the open standard. Check this out if you’re interested.

Most people would agree zSpace‘s 3D demo was the coolest one at SC13, if you saw it.

The immersive 3D interactive computing platform may bring engineering revolution in many fronts, “shaping the future of human-computer interaction and revolutionizing the way people learn, play, and create.”, as CEO of zSpace said when zSpace was nominated in the Computer Hardware & Components category as CES Innovation 2014 Design and Engineering Award.

The exhibit floor closed in the afternoon. I skipped the evening party, “Technical Program Conference Reception”, held in Denver Museum of Nature and Science. It was a cold and snowy night.

The Sixth Day Friday 11/22
The last conference day had only panels and workshops. I attended “1st International Workshop on Software Engineering for High Performance Computational Science and Engineering, then “From the Exascaleto the Sensor-Scale” panel.

    I participated in the workshop sessions:
  • High-Performance Design Patterns in Modern Fortran
  • Extract UML Class Diagrams from Object-Oriented Fortran: ForUML
  • A Pilot Study: Design Patterns in Parallel Program Development

Kathy Yelick, 2013-2014 Athena Lecture award winner, was among panelists for the panel.

    And here’re some topics covered:
  • Big Computing: From the Exa-Scale to the Sensor-Scale
  • Large Scale Sensor-Actuator Computing
  • Cyberinfrastructure: Setting the Stage for Seonsor-Scale, Exa-Scale and Dynamic Data

Final Words
I’m really glad to see and share how High Performance Computing has become integral to every aspect of modern life. From bioengineering to economics, consumer products to medical marvels, advanced science to public safety to entertainment and daily life, HPC is everywhere.

OSC (Ohio Supercomputer Center) hopes to open an app store for HPC software tool with AweSim by the second quarter of 2014. AweSim is a web-based HPC platform to help SMEs (small and medium sized enterprises) to solve big modeling and simulation problems. It would cost about $200 to $500 designing a manufacturing part, to run the simulation and package the results in a report.

SC14 will be held in New Orleans from Nov 16 to 21, 2014. See you there if I can make it.

New FPGA Kickstarter Project

December 12, 2013

My friend Mike Jones and his pals just launched a kickstarted project today, 12/11/2013:
LOGi FPGA Development Board for Raspberry Pi – Beaglebone

The community oriented LOGi FPGA boards are powered by cost-effective Xilinx Spartan-6 LX9 devices.
There are two FPGA boards, LOGi-Pi and LOGi-Bone, plus some expansion add-ons, such as LOGi-EDU with joystick and many different peripherals, and LOGi-Cam with 640×480 camera module.

The LOGi-Pi can be used as a shield with Raspberry Pi. It has 4 PMODs.



The LOGi-Bone can be used as a cape with Beaglebone White or Blank. It has 2 PMODs.



You can use Xilinx’s free version of tools to do FPGA designs. The good news is, you don’t need JTAG programmer to unload your designs to FPGA. C-based loaders are provided. It is fast and easy. I’m glad to say I did contribute in this part of the project.

Go get them before they are all gone. Early adopter can get it only in $69 for one LOGi-Pi or LOGi-Bone. You’ll still need your own Pi or Bone.