Original Link: https://www.anandtech.com/show/16905/hot-chips-2021-live-blog-new-tech-infineon-edgeq-samsung
Hot Chips 2021 Live Blog: New Tech (Infineon, EdgeQ, Samsung)
by Dr. Ian Cutress on August 23, 2021 8:20 PM EST08:22PM EDT - Welcome to Hot Chips! This is the annual conference all about the latest, greatest, and upcoming big silicon that gets us all excited. Stay tuned during Monday and Tuesday for our regular AnandTech Live Blogs.
08:22PM EDT - Going to start here in about 10 minutes
08:30PM EDT - Should just about to start
08:32PM EDT - First up is Infineon
08:32PM EDT - Next gen automotive challenges
08:33PM EDT - Let's go climb a mountain
08:33PM EDT - Literally drive up a mountain!
08:34PM EDT - Evolving technologies - Battery, Sensing, AI
08:35PM EDT - Adaptable architectures with high availability without any legacy impact
08:35PM EDT - Machine Learning - workload specific compute
08:35PM EDT - fast security accelerators for authentification
08:36PM EDT - E-architecture evolution
08:36PM EDT - Connectivity - logical attacks, spoofing - any connection out is an attack vector
08:36PM EDT - Need fail-safe system
08:38PM EDT - Moving towards future architectures with an ethernet backbone and a central computer
08:38PM EDT - ALso helps reducing cost
08:38PM EDT - Infineon Aurix and Tricore architecture
08:38PM EDT - Designed around two decades ago - Tricore
08:38PM EDT - Aurix in production since 2015, Tricore since 1995
08:39PM EDT - Adding modern features as time goes on
08:39PM EDT - 500 MHz in latest gen
08:39PM EDT - new accelerators - parallel processing, enhanced DSPs
08:40PM EDT - ASIL D safety, security standards
08:40PM EDT - Hardware isolation at the core level, 8 VMs per core and Hypervisor
08:40PM EDT - Fine granular access protection, DMA protection
08:41PM EDT - 2 x 5 Gbit ethernet, accelerated MACsec support, hardware acceleration for encryption
08:41PM EDT - two PCIe 3.0 x1 lanes
08:42PM EDT - Full CPU architectgure layout
08:42PM EDT - six cores at 500 MHz
08:42PM EDT - Debug and Trace
08:42PM EDT - SIMD Vector DSP and Scalar core
08:42PM EDT - ARC EV71FS Parallel Processing Unit
08:43PM EDT - Software stack
08:44PM EDT - Security - Security cluster
08:57PM EDT - Supports automotive encryption, intrusion detection, physical or digital
08:57PM EDT - Sorry, Internet cut out for 10 minutes, ISP went borked
08:58PM EDT - Just in the Q&A section now of this talk. Going to cut losses, and just wait for the next talk in 2 minutes
09:02PM EDT - Second talk is EdgeQ - Open RISC-V 5G Radio Access Networks
09:03PM EDT - One of the emerging companies
09:04PM EDT - First software programmable SoC for AI and 5G
09:04PM EDT - 5G basestation on a single chip
09:04PM EDT - 50+ SoCs launched, 2 billion modems shipped, $100b revenue generated
09:05PM EDT - Was in stealth until end of last year
09:05PM EDT - Next Generation RAN
09:06PM EDT - Banding for 5G is important
09:06PM EDT - Progression of 5G RAN over time
09:07PM EDT - OpenRAN using Off-the-shelf hardware
09:07PM EDT - Migration to a cloud native model
09:08PM EDT - Central Unit, Distributed Unit, Radio Unit
09:08PM EDT - Signal processing
09:09PM EDT - Requires scheduling of users
09:09PM EDT - Multiple RUs to one central unit
09:10PM EDT - DU is a hybrid architecture - mixed special hardware or general hardware
09:10PM EDT - What's needed is the open interfaces between each section
09:11PM EDT - 5G programmable baseband DSP
09:12PM EDT - one EdgeQ is in the Radio Unit
09:12PM EDT - Distributed Unit has multiple EdgeQ chips for signal processing
09:13PM EDT - Developing a converged SoC
09:13PM EDT - Need a programmable DSP engine
09:14PM EDT - RISC-V with 50+ custom instructions
09:14PM EDT - eight-core Arm Neoverse CPU subsystem
09:14PM EDT - Accelerators, IO subsystem, PCIe, USB, Ethernet
09:14PM EDT - GNU Tool Chain
09:14PM EDT - Massively parallel
09:17PM EDT - Supports multiple configurations and is software upgradeable
09:17PM EDT - beamforming, other intense operations
09:18PM EDT - gang up to 4 chips for up to 40 Gbps
09:18PM EDT - Life of a packet within a chip
09:20PM EDT - 'Profound disruption in 5G and ORAN'
09:20PM EDT - Sampling Now
09:21PM EDT - Q&A time
09:22PM EDT - Q: Process Node - A: Not disclosing public, but TSMC FinFet
09:22PM EDT - Q: Neoverse cores? A: E1, at 2 GHz
09:23PM EDT - Q: TDP range? A: Not disclosing. Base station unprecedented. Power is low. Very competitive for this implementation. Maybe in the teens
09:23PM EDT - Q: RISC-V and Arm, What's the RISC-V base? A: Licence IP from Andes, but functionality is custom
09:30PM EDT - Samsung time
09:30PM EDT - HBM2-PIM
09:32PM EDT - Been working with vendors on PIM for a while
09:33PM EDT - What is PIM - rather than move data to teh CPU or accelerator for basic operations, do it right in memory
09:33PM EDT - PIM proof of concept is difficult, only Samsung so far
09:34PM EDT - Designed to be inserted into current solutions
09:34PM EDT - Expanding the pyramid of storage
09:35PM EDT - Aquabolt-XL, system level 1st gen PIM memry based on HBM2 Aquabolt
09:36PM EDT - Memory bound workloads, such as AI
09:37PM EDT - or perhaps crypto?
09:37PM EDT - 2x system performance at 70% energy
09:38PM EDT - PIM unit has 3 units
09:38PM EDT - FP16 SIMD, controller, and register files
09:39PM EDT - No additional timing impact on memory
09:39PM EDT - Still Samsung specific, working with JEDEC for proper spec
09:40PM EDT - Works by using current signalling techniques with no overhead
09:41PM EDT - Use PIM library replacements for AI and recompile
09:41PM EDT - Python, BLAS, GEMM
09:42PM EDT - PIM execution blocks
09:42PM EDT - HBM2 8Hi stack has 4 PIM + 4 HBM dies
09:42PM EDT - Compute bandwidth is 1.23 TB/s and 4.92 TB/s off-chip + on-chip
09:43PM EDT - Synthetic benchmark testing
09:44PM EDT - Best performance gain on batch 1
09:44PM EDT - +5.4% power compared to regular HBM
09:45PM EDT - Is that iso-capacity?
09:46PM EDT - Evaluation with reduced power overall
09:46PM EDT - Reduced overall system power and execution time
09:46PM EDT - Natural Language Processing
09:47PM EDT - Xilinx model with HBM2-PIM, coming September ?
09:47PM EDT - U280+PIM test results
09:48PM EDT - Neural Networks
09:48PM EDT - 3.4x perf/watt
09:49PM EDT - Can also be applied to LPDDR5, such as LPDDR5X-6400
09:49PM EDT - based on simulation results
09:50PM EDT - Camera use cases
09:51PM EDT - DIMM level PIM
09:51PM EDT - DDR4/DDR5 compatible
09:51PM EDT - Requires a buffer
09:52PM EDT - AXDIMM buffer
09:52PM EDT - Evaluation system
09:53PM EDT - Those add-in boards look fun
09:53PM EDT - can I have one
09:53PM EDT - PoC on a Broadwell server
09:54PM EDT - GDDR6 and HBM3 in the future
09:55PM EDT - HBM3 will have FP16 and FP32, currently only INT8 and INT16
09:55PM EDT - Trying to introduce JEDEC standard with HBM3 by initial spec at end of year
09:55PM EDT - Q&A time
09:56PM EDT - Q: How does PIM manage coherence with host? A: memory vision will be offload will not be cached, but those applications have low data reusability
09:58PM EDT - Q: Does software need to know HBM-PIM is there? A: Yes need to recompile
10:01PM EDT - Q: +5.4% power is iso-capacity A: Not answered and evaded
10:02PM EDT - That's all for today! Come back tomorrow!