Original Link: https://www.anandtech.com/show/10680/arm-research-summit-2016-keynote-live-blog
ARM Research Summit 2016 Keynote Live Blog
by Ian Cutress on September 15, 2016 3:28 AM EST03:41AM EDT - ARM's first Research Summit is happening today at Churchill College, Cambridge. We have near-front row seats and are expecting some details on future HPC plans today.
03:42AM EDT - Initial comments from the pulpit: the SoftBank acquisition means business as usual
03:42AM EDT - Also, 'Brexit means Brexit', but ARM is multinational and is still being awarded EU funding for projects.
03:44AM EDT - Eric van Hensbergen first up
03:45AM EDT - He was first involved with the 2002 Earth Simulatior, 35.6 TF, 640 nodes
03:45AM EDT - Also involved in Roadrunner, first PF machine in 2008, then joined ARM in 2012 to lead the Exascale program
03:46AM EDT - He's the Director of HPC
03:47AM EDT - Back in 2012, 64-core ARM was a 'monster project', from Phytium. Realised 2015 with Mars using 64 'Xiaomi' cores
03:49AM EDT - Mont-Blanc project, seeing what could be done with current ARM cores with the Barcelona Supercomputing Centre. Based on Exynos (A15) with Mali-T604
03:51AM EDT - Discussing big.Little with HPC
03:51AM EDT - In some instances, lots of small cores are better for SoC rather than big chunky ones.
03:53AM EDT - Simulated energy department workload via PCA for HPC, with microarchitecture analysis (PCA)
03:53AM EDT - Some workloads are core, L1, L2, L3 or DRAM dependent - vital to see what core designs make sense
03:54AM EDT - Lots of workloads were cache sensitive, over core sensitive, and the graph shows this
03:55AM EDT - Scaleable Vector Extensions (the new announcement) was a result of input from this testing and partners like Cray and Fujitsu
03:56AM EDT - Neon wasn't enough, it was more DSP focused. Hence SVE was created
03:56AM EDT - We saw a lot of workloads were pushing out vector lengths, so rather than redesign the uArch every 2 years, make an agnostic design
03:57AM EDT - SVE is an optional part of the licencable ARMv8-A architecture
03:58AM EDT - Aside from SVE, bottlenecks in memory and cache are frustrating. Working with Sandia National Labs and DoE to address this issue through new technologies
03:59AM EDT - Lenovo servers with Cavium deployed in the UK were the first HPC ARM cores being shipped. 1152 64-bit ARM cores in 6U
04:00AM EDT - Design centre in Manchester (UK) to focus on tools/libraries and runtimes for ARM HPC support in commercial
04:04AM EDT - Porting OpenHPC packages for ARM
04:05AM EDT - ARM is a silver member of OpenHPC
04:06AM EDT - Currently at 131/166 packages ported
04:06AM EDT - Member of many international standards - HSA, JEDEC, OpenSHMEM, CCIX, OpenCompute, OpenMP, HMC Consortium
04:06AM EDT - 'Some of the competition consolidate the aspects of computing into their portfolio - ARM is about encouraging diversity and competition'
04:07AM EDT - 'We want to test real world workloads so we can tune our general purpose architectures towards what people are facing'
04:08AM EDT - 'You want Exascale to be widely used and widely applicable to the widest range of applications'
04:08AM EDT - 'Data analysics, for design and application, is an important area we focus on'
04:08AM EDT - A number of China partners are focusing a lot on HPC, there were six strains being followed and now down to three to be developed in 2017
04:08AM EDT - Also embedded HPC, next gen processors for space
04:09AM EDT - Dedicated HPC Tools for ARM are all online. Building the community over time, user group meetings etc
04:14AM EDT - Steve Furber, ICL Professor of Computer Engineering, University of Manchester
04:14AM EDT - SpiNNaker project has been discussed for 20 years, developed over the last 10
04:14AM EDT - The issue is the ability to simulate a brain
04:14AM EDT - Part of the EU Big Brain Project
04:15AM EDT - 'Simulating a brain can help a lot of common problems in many areas'
04:16AM EDT - Current estimates to run a real-time human brain model require a post Exascale system
04:17AM EDT - Obviously that takes tens of megawatts, and the human brain uses 20W or so
04:17AM EDT - Conversely, a brain can't simulate a computer either.
04:18AM EDT - Brains are comparatively slow, doing things on the order of a millisecond, not nanoseconds
04:18AM EDT - The most efficient cores tend to be the ones that do the least work
04:18AM EDT - Brain is very good for fault tolerance
04:21AM EDT - The Human Brain project is a headline EU Flagship project, headline 1b euro budget over 10 years. or 100m/year for 120 institutions
04:21AM EDT - Questions about Brexit are unknown, for now
04:22AM EDT - If you're wondering what brains has to do with ARM HPC, it's like a big roadmap goal in computing in general, so ARM want to play a role
04:22AM EDT - Hence why it's a big part of this Keynote
04:23AM EDT - 'Turn the high performance computer into something you can interact with because it has a 'brain''
04:24AM EDT - An issue with a million ARM SoCs, network topology is complex and scalability is an issue
04:25AM EDT - A mouse is 1/1000 of a human brain, so that's an intermediate target
04:26AM EDT - Apply computer-based mouse brain to a mouse robot, and if the result likes cheese, it's a ticked box
04:27AM EDT - 'You decouple the topology of the network from the topology of what you need.'
04:28AM EDT - 'You decouple the topology of the network from the topology of what you need.'
04:28AM EDT - 'Time models itself and it all runs async. Processors have to cope'
04:28AM EDT - Spinnaker chip uses a core and LPDDR memory from Micron in one 2.5D package
04:29AM EDT - This package method saves 15% power
04:30AM EDT - 18 ARM968 cores in a chip, at 130nm
04:30AM EDT - ARM968, because it's cheap. DRAM is 1mm from the processor
04:31AM EDT - The memory is scratchpad, not caches
04:32AM EDT - The packet switch router is the key data transfer point
04:32AM EDT - The router is the key innovation in spinnaker for realtime neural networks
04:34AM EDT - The router shows processor spikes, and where spikes have to travel to. 3-state content addressable associative table
04:34AM EDT - 1 incoming packet could become 24 outgoing packets, depending on table configuration
04:34AM EDT - 'You decouple the topology of the network from the topology of what you need.'
04:35AM EDT - dispatch the packet ASAP, which the hardware achieves
04:36AM EDT - SpiNNaker will be used by non-ARM gurus, so SDKs and coding hierarchy is provided
04:38AM EDT - Obviously this is a university project, so budgets are low. Hence ARM968, rather than say A53
04:39AM EDT - 20k core machine requires 2kW
04:39AM EDT - 100k cores at 10kW, managed to 5kW, idle much lower
04:41AM EDT - Human Brain Project platform uses 500,000 cores, 6 cabinets, and any academic can use it.
04:44AM EDT - Spinnaker uses 10nanojoule per spiked connection. Human brain is 10 femtojoule, BlueGene is about 1 Joule
04:45AM EDT - Prof Furber developed a Sudoku solver in 36400 neurons, solves any sudoku in 10 seconds. Example project
04:47AM EDT - Third part of the Keynote, dealing with cache and memory design for HPC for the 2020-2030
04:50AM EDT - Need to rethink memory, lots of chip space is dedicated to memory
04:51AM EDT - THis talk is fast paced with lots of slide detail. I may resort to a lot of images here
05:07AM EDT - Basically, Rowhammer is an issue. How to solve
05:29AM EDT - Keynote is over, upload is 10KB/s so just waiting for the last pictures to upload...
06:10AM EDT - Uploaded! There's two days of talks to go to for the summit. You can follow at #armsummit