Hot Chips 2020 Live Blog: IBM z15, a 5.2 GHz Mainframe CPU (11:00am PT)

Original Link: https://www.anandtech.com/show/15987/hot-chips-2020-live-blog-ibm-z15-a-52-ghz-mainframe-cpu-1100am-pt

Hot Chips 2020 Live Blog: IBM z15, a 5.2 GHz Mainframe CPU (11:00am PT)

by Dr. Ian Cutress on August 17, 2020 2:00 PM EST

20 Comments

02:04PM EDT - Final talk of this session is IBM z15. This is big iron mainframe stuff. Prepare to be shocked about these chips, and wonder why you don't have them.

02:04PM EDT - Mainframes are still releveant - 220+ billion lines of COBOL still in deployment today. 70% of all business transactions still use COBOL

02:05PM EDT - Programs built in 1964 on IBM mainframes still work today

02:05PM EDT - 70k searches on google per second, vs. 1.3 million transactions per second on mainframes

02:06PM EDT - Deep pipeline high frequency z-series

02:07PM EDT - z13 introduced SMT, z14 introduced pervasive encryption

02:07PM EDT - These CPUs are ground up built by IBM and not used by anyone else

02:07PM EDT - Two bits of silicon - Storage Controller with 960 MB L4 cache, four Complte Chips

02:08PM EDT - 5 drawers are fully connected through the SC chips

02:08PM EDT - Large 14nm SOI designs

02:08PM EDT - 700mm2+ each

02:08PM EDT - ~700mm2

02:08PM EDT - Each SC chip has 12 cores. 8 MB L2 cache

02:08PM EDT - Max config supports 240 cores. 190 cores are customer available, others are for management or recovery

02:09PM EDT - 60 PCIe 4 x16 connections

02:09PM EDT - 40 TB RAIM memory supported

02:09PM EDT - Two CP chips create a single logical cluster

02:10PM EDT - CP chip

02:10PM EDT - 12 cores, 5.2 GHz

02:10PM EDT - 9.1B transistors

02:10PM EDT - 128 KB L1-I cache, 128 KB L1-D cache

02:10PM EDT - L2 4MB cache private

02:10PM EDT - 256 MB L3 cache shared

02:11PM EDT - Secure exectuion - 38 new instructions of vector performance

02:11PM EDT - speed up of common instructions in a smart way

02:11PM EDT - on-chip accelerators such as gzip, elliptic curve crypto, on-core sort/merge

02:11PM EDT - Here are the comparisons to z14

02:11PM EDT - 14% ST perf over z14

02:12PM EDT - Deep pipeline, CISC architecture

02:12PM EDT - branch is async

02:12PM EDT - two copies of almost everything shown

02:12PM EDT - recovering unit for when errors are detected - processor rolls back

02:13PM EDT - This allows for transient recovery from hardware errors

02:13PM EDT - Known good state can be transferred to a new core if non-transient error happens

02:13PM EDT - The goal of these cores is to be recoverable, even when blasted with high-energy proton beams

02:15PM EDT - NXU is syncrhonous and runs in real time

02:16PM EDT - Two main ways for compression - IBM uses both depending on the size to get the best results

02:17PM EDT - Elliptic curve cryptography acceleration unit in each core, along with enhanced modulo unit which it relies on

02:17PM EDT - 'MA unit' has its own instruction set and 'core'

02:17PM EDT - sign and verify is implemented as firmware and hardware

02:17PM EDT - Acts as a blueprint for future accelerators

02:17PM EDT - Attached to back-end of the pipeline

02:18PM EDT - All execution is in order and non-speculative

02:18PM EDT - No pipeline pain points

02:18PM EDT - Results are passed to the core

02:18PM EDT - physically these accelerators could be placed far away from core logic as needed

02:18PM EDT - dozens to hundreds of modulo ops with a couple of doubleword instructions

02:19PM EDT - Core is called millicode ?

02:19PM EDT - Here are the internal instruction set

02:19PM EDT - speed ups vs external attached PCIe accelerator on z14

02:19PM EDT - Secure Execution for z15 with vertical isolation

02:20PM EDT - specialized mode in the CPU, IO, and memory subsystem

02:20PM EDT - ultravisor sits between the hypervisor and OS

02:21PM EDT - integrity hash and input/output counts to stop malicious guests

02:22PM EDT - controlled code environment

02:23PM EDT - 5.2 GHz water cooled

02:23PM EDT - 4 CP and 1 SC chip per drawer

02:23PM EDT - 14% ST and 25% cap vs z14

02:24PM EDT - Q&A time

02:25PM EDT - Q: Pipeline depth? A: It's long! Long front end and back end and extends with the recovery

02:25PM EDT - Around 30

02:25PM EDT - Q: L1 / L2 load to use latency? A: L1 4-cycle loop, 8-cycle for L1 miss/L2 hit

02:26PM EDT - Q: For secure execution, what is the isolation from firmware? A: Validated trusted firmware / ultravisor. That's an integral part of our secure guest security

02:27PM EDT - Q: Async branch prediction? What happens if it's behind i-fetch A: It's lossless, and i-fetch is in-order. After a pipeline restart, if i-fetch is ahead, the pipeline will react and throw away as required. There are syncs - there's a hard sync at dispatch, so no predictions are dropped

02:27PM EDT - Q: Core IPC vs Power10? A: ask power!

02:28PM EDT - AES-256 for page encryption, integrity hash is SHA-512

02:29PM EDT - Q: 5.2 GHz? How? A: Deep pipelining and focus on gate design. A lot of work. Deep pipelining is table steaks, but a lot fo other things are needed

02:30PM EDT - Q: Is power/eff an important focus? A: It does consume less power than z14 that's similar configured. From a chip perspective, the focus wasn't to reduce overall power - the focus was on performance and throughput. That was done to put two more cores in and double caches - we burned the power budget to add in more performance. This is the sort of product this is. We stuff more hardware and acceleration.

02:30PM EDT - That's a wrap for the first session. Come back in 30 minutes for the next session, where we start on AMD's 4000-series Renoir

02:31PM EDT - .

Hot Chips 2020 Live Blog: IBM z15, a 5.2 GHz Mainframe CPU (11:00am PT)

Log in

Don't have an account? Sign up now