Original Link: https://www.anandtech.com/show/6263/intel-haswell-architecture-disclosure-live-blog
Intel Haswell Architecture Disclosure: Live Blog
by Anand Lal Shimpi on September 11, 2012 1:27 PM EST- Posted in
- CPUs
- IT Computing
- Intel
- Haswell
- Trade Shows
- IDF 2012
02:28PM - And we're done! We'll be working on a deeper Haswell architecture piece over the next couple of days.
02:27PM - Intel isn't disclosing exact details on what aspects of voltage regulation have been integrated
02:27PM - But lots of the fine grained control on client Haswell platforms we'll see in servers
02:27PM - Not going into detail on the Haswell server product today
02:27PM - Haswell will include far more power gates on the platform level
02:26PM - Haswell integrates some but not all of the voltage regulation so Intel can do more fine grained control of the pieces inside the die
02:25PM - Sidenote: it's always hilarious to see how many Intel OEMs and competitors end up in these tech insight sessions
02:25PM - TSX support coming in Linux and Windows
02:23PM - Time for Q&A
02:22PM - Piazza on Haswell GPU: "this is certainly not the end"
02:21PM - Nearing the end - Summary time
02:21PM - In the past only had two concurrent engines: codec and imaging/scale/composite, now you can do more in parallel as long as there's enough bandwidth to sustain
02:20PM - Now there are three concurrent video engines: codec, imaging and scale/composition
02:19PM - Hardware image stabilization is new in Haswell
02:18PM - Moved some video processing stuff off the EU array into a dedicated video quality engine
02:15PM - 4Kx2K video acceleration is supported
02:15PM - Usages: video serving, multi-party video conferencing
02:14PM - Introducing hardware based SVC codec, can encode once and playback at multiple resolutions
02:13PM - Higher encode quality, faster Quick Sync with GT3
02:13PM - Now talking about Haswell video processing
02:11PM - GT3 seems to double everything
02:11PM - Half a terabyte of internal bandwidth between compute and cache
02:10PM - Doubled the performance of most of the fixed function units for normal rendering on the GT3 part
02:09PM - Added a resource streamer at the front end, offloads some driver work which helps the CPU go to sleep so the GPU can do work on behalf of the driver instead of the CPU
02:08PM - Independent voltage/frequency domains for CPU, ring and GPU now?
02:08PM - CPUs can run at low voltage/low frequency, but the GPU can now pull the ring up to feed the engines without pulling up the CPU voltage/frequency
02:08PM - Haswell totally decouples the ring from the CPU
02:08PM - There's now a GT3 part
02:07PM - Haswell GPU architecture is similar to IVB, Broadwell will likely be different
02:05PM - Tom Piazza is on the stage
02:04PM - Now on to graphics innovations
02:04PM - One hour session tomorrow on TSX, hmm I hope it doesn't conflict with another major event...
02:03PM - Hardware can then attempt to extract parallelism with concurrent memory accesses
02:03PM - TSX allows the developer to give hints about concurrent accesses
02:03PM - But what if you have two threads accessing the same table but are updating completely independent things?
02:02PM - Normally when you have many cores working on the same data structure, you typically have one thread handle updates and lock the structure for everything else
02:01PM - Now talking about Intel Transactional Synchronization Extensions (TSX)
02:01PM - This will benefit AVX2 code as well as on legacy code as well
02:01PM - Also doubled bandwidth at L2 cache, went from 1 read of the L2 every other clock cycle to a read every clock cycle
01:59PM - This is for the L1 data cache
01:59PM - Can also do a write of the cache as well, 2 reads + 1 write at 256bits wide
01:59PM - Can now do a 256-bit load, AVX load, with a single read of the cache - and two ports
01:58PM - Same sizes L1/L2 caches as SNB/IVB
01:58PM - Whenever we double the FLOPS like we did here, you need to double the capability to feed those units
01:57PM - A bunch of new vector and scalar instructions
01:56PM - 4x the peak FP throughput of Nehalem
01:56PM - Since Haswell can do 2 FMAs every cycle per core
01:55PM - AVX2 doubles peak FP throughput of Haswell
01:54PM - Ooh: even deeper dive on Haswell microarchitecture later today
01:54PM - L2 TLB is bigger
01:53PM - We now have the ability to do two FP multiply-adds every cycle
01:53PM - Added another integer ALU, can now execute 2 branches per cycle, another store address port, can do 2 loads and a store every cycle
01:53PM - Haswell adds port 6 and 7, up to 8 ops every cycle
01:53PM - Nehalem/SNB could execute 6 ops every cycle, port 0 - 5
01:52PM - Improved branch prediction
01:52PM - Increasing size of buffers internally, giving us larger OoO window
01:52PM - Now it's time to talk about Haswell CPU microarchitecture
01:51PM - A lot of focus on improving overall platform power, not just the CPU/SoC
01:50PM - Haswell adds more low power IO: I2C, SDIO, I2S, UART
01:50PM - Panel self refresh is supported (if the image doesn't change, display just keeps displaying the same image, rest of the platform goes to sleep)
01:49PM - Worked on increasing efficiency of voltage regulators
01:49PM - To meet the power goals Intel worked with OEMs to give power budgets for main components in the rest of the system
01:48PM - This is how you achieve the 20x platform idle power improvement
01:48PM - We can work with our friends at the process manufacturing side, adapt the process to give us a recipe to fit the processor/die perfectly
01:48PM - Even deeper C-states, can transition between C-states up to 25% faster
01:48PM - Power delivery system is much more fine grained in delivering power to only the pieces that need to be on
01:47PM - That link is optimized for the lowest energy per transfer possible
01:47PM - The link between the CPU and the chipset has been optimized for power, depending on which Haswell part you get
01:47PM - Finer grained voltage/frequency control
01:47PM - Haswell extends the turbo range a little bit
01:46PM - Haswell platform is almost always in this new S0ix active idle state with instant resume
01:45PM - It sounds like Haswell remains in S0 but can quickly transition to active idle, allowing you to get the best of both worlds
01:45PM - "Transparent to well written software"
01:45PM - The hardware does this automatically, continuous, fine grained
01:45PM - Transition times are a lot shorter between high and low power states
01:45PM - This is where we get improvements in platform idle, and battery life
01:44PM - OS thinks the SoC is active, but you get idle power characteristics and can transition between active and idle very quickly
01:44PM - Added completely new set of idle states: S0ix
01:43PM - In the same level of system responsiveness, the system power has come down - transition times to lower power states are quicker now as well
01:43PM - In Haswell, we have worked in making power efficiency/power for active be much better
01:42PM - And you transition between the two, active state was in watts, idle states go into hundreds of milliwatts
01:42PM - IVB had two major power states: S0 (awake) and S3/S4 (sleep)
01:41PM - When you get into those power levels (8W), you can get into very attractive tablets, and also think about going fanless
01:41PM - We can also have the same graphics performance at half the power
01:41PM - Haswell achieves, at the same power level, we have twice the graphics performance [over IVB]
01:40PM - Now talking about Haswell Power Management
01:40PM - "Haswell adds agility"
01:39PM - Active power: from tablet to desktop
01:39PM - Design points in the past still exist, but adding lower power design points that we never had before
01:38PM - Haswell Modularity: 2 - 4 cores, GT1 - GT3 graphics
01:38PM - The same power enhancements you need to get into tablets actually benefit many core server designs as well
01:35PM - Haswell will go from tablets to servers and everything in between
01:35PM - Today's disclosure will focus on what's new
01:35PM - Haswell Design Philosophy: retain prior SNB/IVB microarchitecture features, Hyper Threading, Turbo Boost, Ring Interconnect
01:34PM - Span of Haswell family is larger than previous architectures
01:34PM - Haswell is a tock, second 22nm CPU but significant change at the platform and architectural level
01:32PM - We're going to get a high level architecture disclosure as well as some indication of what we'll see in client deployments of Haswell
01:31PM - Ronak Singhal, one of the Haswell architects, is talking now
01:30PM - Seats are filling up, we're waiting for the session to begin