Comments Locked

43 Comments

Back to Article

  • anonomouse - Thursday, December 6, 2018 - link

    Isn't Fujitsu's A64FX chip much, much bigger than this, also on 7nm? As well as AMD's Vega 20? Unless 'biggest chip made on TSMC's newest 7nm process' is supposed to mean 'biggest mobile chip', in which case maybe, +/- whatever the size of A12X actually is.
  • name99 - Friday, December 7, 2018 - link

    Has a size for the A64FX been given?
    It's "only" 8.7B transistors, so less than the 10B or so of the A12X.
    Of course I assume those are high performance transistors, packed into high performance cells, and there is a lot more PHY analog area, so I'm sure it's physically substantially larger than an A12X.

    But it's interesting,IMHO, just how similar some of the specs are --- one for iPad, one for HPC, but both about the same number of transistors.
  • levizx - Friday, December 7, 2018 - link

    What do you mean, much bigger? A64FX has only 8.8b transistors, if Kirin 980's 6.9b/74.13 mm² is anything to go by, that's less than 100 mm². And on top of that, post-K won't come online until at least 2021. So the processor probably won't be in production until 2020.

    As for Vega 20, probably because it's on a different process, it's only shrunk about 25% from 14LPP, there's no way that's built on N7SOC, the density is barely better than 12FFx/10FF.
  • iwod - Thursday, December 6, 2018 - link

    At roughly 110mm2, there are still room for larger die size in the next iteration. I wonder how much Qualcomm charges for it? Once Windows is on ARM64, there are no more excuses for Apple to keep Mac on x86.
  • sharath.naik - Saturday, December 29, 2018 - link

    16 GB ram? No operating systems or programs are going to take such a platform seriously, outside where limited power drivers the decision. Laptops don't have that problem, case point lg gram 14. How is ar. Going to claim an advantage over it?
  • defferoo - Thursday, December 6, 2018 - link

    isn't the A12X Bionic bigger at 122mm^2?
  • Ryan Smith - Friday, December 7, 2018 - link

    You are correct sir!
  • serendip - Thursday, December 6, 2018 - link

    All this work on custom chips could also mean a new lease on life for Qualcomm's server division. SoCs for laptops and servers have much higher power constraints than mobile ones.
  • syxbit - Thursday, December 6, 2018 - link

    Just a guess, but Qualcomm has been really lazy and risk averse recently.
    I bet this is just a slightly tweaked A76 with higher clocks and more cache.
    If that's the case it won't compete with A12.
    Their GPUs however have been good (but they acquired that from AMD)
  • skavi - Friday, December 7, 2018 - link

    did you even read the article? This thing is fairly monstrous, at least on the GPU side where Adreno has always excelled.
  • ph00ny - Friday, December 7, 2018 - link

    Oddly enough on the leaked benchmarks (antutu), it was slightly better than mali G76 setup used in exynos. Of course we don't know how it will perform on the device but the fact that they were that close was rarity
  • syxbit - Friday, December 7, 2018 - link

    >did you even read the article?
    Yes.

    >This thing is fairly monstrous, at least on the GPU side where Adreno has always excelled.
    Yes. That's what I said.
  • skavi - Friday, December 7, 2018 - link

    If a 2019 "gaming phone" doesn't have one of these inside, I will be disappointed. This is a chance for that market segment to actually differentiate itself (RGB lights aside).
  • TheJian - Friday, December 7, 2018 - link

    About time, but still way short. Make it 75-100w for a desktop and I'll bite, providing it accepts NV or AMD discrete cards ALSO :) I'm still waiting for NV to make a 75-100w desktop ARM core with discrete support when desired. Bring it!
  • Santoval - Friday, December 7, 2018 - link

    That is a server-CPU level of TDP, which tend to have a multitude of small ARM cores. I don't think an 8-core ARM CPU will ever -or need to- go above a 35W TDP, 45W tops. Now, as you increase the cores (not to server levels but maybe to "workstation" levels) to 10 - 12 or up to 16, assuming the clocks do not fall drastically to compensate, the TDP could go between 60 and 75W.
    Not with ARM's reference designs those, new wider designs are required for high clocks and high TDPs. Apple is already half there, though they still design CPUs for mobile & tablet TDPs.
  • Santoval - Friday, December 7, 2018 - link

    edit : "...ARM's reference designs *though*..."
  • Alexvrb - Saturday, December 8, 2018 - link

    Short for you, but just about right for tablets, hybrids, and entry-level laptops. Which is what they're targeting, and it's within reach without having to completely redesign for a high-power design.

    Maybe if they succeed, we'll see 15W laptop chips in a couple years.
  • yeeeeman - Friday, December 7, 2018 - link

    So let me see. The SD835 (including SD850) were close in performance to a Pentium Y 4.5W chip (that is dual core with very low clock speeds) and in some cases, were slower than that, close to a Celeron chip.
    Even if this chip is 2x faster, it will still be close to i3 levels, with worse performance in many situations since many apps will be run in emulation mode.
    I think their effort is nice, but until they drop prices, 1K $ for a crippled laptop is WAY to expensive, they don't stand a chance. Also, the always connected stuff is a moot point in my opinion. They are selling a laptop, not a phone, so there is no point in staying always connected.
    Battery life will get close to Intel levels now that they increased performance. Remember that with SD835 they could brag about the fact that they had amazing battery life, but that was a very slow chip, so it is to be expected.
    I think it is nice to have other competitors in this space, but they need to do better and work their asses on the SW side of things to convince developers to recompile all their apps to native ARM 64bit. Only then they will be able to stand a chance in this space.
  • Wilco1 - Friday, December 7, 2018 - link

    Given that Kirin 980 is already 2 times faster than SD835, it looks like this will be more like 2.5 times faster due to higher frequencies, 128-bit memory and more cache. Besides being faster, it should be far more power efficient than Intel mobile chips in 2019/2020 - remember this is 7nm and using more efficient LPDDR4X. The Chrome and Firefox announcements show the recompilation is already happening, and a promising chip like this can only increase that.
  • sing_electric - Friday, December 7, 2018 - link

    The pricing of those devices is always the rub: You have to accept lower performance (therefore, productivity to some degree) at the same price to get longer battery life. At Chromebook prices, they'd be very competitive, but you're paying Ultrabook prices for a device that's in the same performance range as a ultra-budget laptop.

    At the same time, I don't think consumers are really the target market here: Instead, its probably enterprise customers who want people in the field to be always connected and who care about security & reliability. For them, not being able to run every app is a feature, not a bug. Plus, chances are their programs are either fairly new & updated (so can be easily recompiled for ARM) or ancient and not (in which case emulated Win32 performance is probably good enough).
  • gaadikey - Friday, December 7, 2018 - link

    My friend Girish M Mani from Qualcomm India along with his 5 friends has designed this chip. I really appreciate the hard work the guys have put to turn this as reality.
  • halcyon - Friday, December 7, 2018 - link

    Could this find it's way into an Android/Chromebook tablet (not convertible, not 2-1-in, not fan-cooled luggable laptop, not just ARM-Windows device)?

    I'd love for a poper Android/Chromebook tablet with a laminated 120Hz display and a pressure sensitive pen running this Chip.

    Don't want to do iOS/iPad anymore, already tried it and the OS is trying to gimp everything that I'm accustomed to doing.

    And ARM-based Windows tablets that weight 800g+ are not what I'm really looking for.

    Here's hoping this ends up in more than just development boards and ARM-Windows 2-in-1's.
  • jjj - Friday, December 7, 2018 - link

    First thing first, get info on what cores they use. Just because they use the same 495 for all, it doesn't mean all are big cores.
    Anyway, bigger die and modem, add licenses, front end and so on and we get 1000$ machines with limited software functionality so good luck with that.
    If they had a SD675 with no modem for PC, that would be something that would not suck too much.
    This, this is something they can't sell and by the time the software is sufficient in PC, the PC is dead.Dumb strategy, huge waste of resources. If you want to prepare for glasses on the software side, in case M$ does well there, then at least make a PC SoC you can sell, don't sabotage your own goals with solutions that are not commercially viable due to the absurd price.
  • peevee - Monday, December 10, 2018 - link

    Do you KNOW the price of the chip? It is probably below $100. Not a major part of a cost of a laptop.
  • sing_electric - Friday, December 7, 2018 - link

    One of the arguments I've heard is that Apple's chips can outperform Qualcomm's because QC has to hit a specific price target on its SoC to keep OEMs happy - Apple doesn't, and can spread costs around more, hence, Apple can focus on performance rather than cost (and related issues like die size, etc.)

    I guess QC's efforts in this area will be the first test of that, since, based on system pricing, the real competitor is no longer other ARM chips but really something like an i5 laptop (which has additional costs for modems, etc. that are baked in to QC's SoC). If QC can't compete to win in THIS segment, it highly suggests that Apple's out-engineering them, rather than engineering for different objectives.
  • Raqia - Friday, December 7, 2018 - link

    If you mean straightline CPU performance and efficiency, Apple is surely ahead of even Intel now. Apple knew it had more die space to play with as it doesn't have its own modem IP to put on die so it used that space for a much beefier CPU. They crammed a desktop class design into a mobile device, but there is a trade off: despite its superior efficiency, Apple forgot to throttle properly and the resulting current draw at heavy loads has been detrimental to battery life:

    https://bit.ly/2ryPO61

    This is great for first month bench-marketing for reviews, but not so much for users who weren't informed that this superior performance would only last through about a year of regular use.
    Apple likes to tout that its SoC is unique to the iPhone, but making a part used by many OEMs and software platforms means to me that tolerances are better and bugs more thoroughly reported and addressed.

    The CPU is mostly responsible for controlling program flow and more compute intensive operations are more efficiently performed in coprocessors on which die space is better used in mobile; memory related CPU scores are probably more relevant than ones that stress the ALUs and it sounds like Qualcomm's customizations to ARM's designs have been made precisely here. These show up on its excellent system level performance despite lower scores in focused "engine revving" benchmarks. Qualcomm has also been consistently superior in performance and efficiency when it comes to GPU, DSP, and modem (not even present on an AX die) but these don't show up on typical benchmark suites that continue to hew close to legacy 90's and 00's gaming PC style benchmarks focused mostly on CPU and GPU (although I don't blame anyone for this because those blocks are much less accessible than the CPU or GPU where there's cross platform benchmarks or recompilable content.)
  • skavi - Friday, December 7, 2018 - link

    Exactly why I'm so excited for the numbers on this chip. Though we have to keep in mind that QC is using significant die space for the modem, and is (self) limited to the cores that ARM designs.
  • dustwalker13 - Saturday, December 8, 2018 - link

    I really hope the next Surface Go will be on that Basis. It already is my perfect device for travel, only the battery life and connectivity could be better.

    The Go on ARM and the Pros on AMD with zen2 to get proper graphics support would be my dream setup.

    I know Intel is probably downright paying MS to use their core series at this point, but using an adapted apu in xbox and not for tablets is just a bad design decision at this point, especially when AMD is so aggressive with their cusom cpu/apu solutions, has way better graphics performance and would tailor an apu for the Surface lineup in a second.
  • MutualCore - Saturday, December 8, 2018 - link

    The issue always remains - what is the software you will be able to run? Windows 10 on ARM is a 100% unknown at this point.
  • Alexvrb - Saturday, December 8, 2018 - link

    You can run pretty much anything that you would run on a similar TDP Intel chip. That's the point of the hybrid emulation. But to get the best performance, it has to be recompiled. They're pushing devs in that direction, so yeah I'd also like to see the next-gen Surface Go use this chip or a successor.

    I also agree with him on the Pro models. Zen 2 APUs would be a huge boost where it needs it the most - the GPU!
  • MutualCore - Saturday, December 8, 2018 - link

    The 8CX is a nuclear MIRV aimed straight at Intel Core i3/i5.
  • Hulk - Monday, December 10, 2018 - link

    Are there any educated estimates of how this CPU will perform emulating Windows applications compared to an Intel processor? For example, it should perform in Windows like an Intel xxxx...
  • peevee - Monday, December 10, 2018 - link

    It certainly has pre-AVX2 vector performance, as ARM NEON is 128-bit.
    Code size is about 2x of x64, so effectively half of real Level 1 I-cache.

    Modern x64 instruction set, especially with AVX512 but even before that, is so huge, that the emulator would either spends half an hour optimizing the code (at least on the first run of any decently sized application), or produces highly unoptimal code, like Debug/-O0 builds which perform roughly 10x slower than -O3 builds.
  • Wilco1 - Monday, December 10, 2018 - link

    Vector performance depends on the vector width, number of vector units and latency of the units. Cortex-A76 has 2 128-bit FMA units with very low latencies, particularly for chained FMAs which are significantly faster than on any x86 CPU.

    AArch64 codesize is smaller than x64. x64 averages ~4.5 bytes per instruction nowadays.

    The size of the instruction set is completely irrelevant to emulation speed. JIT compilation is a solved problem - both startup and runtime overhead is small, typically less than 2x, but this can likely be reduced further.
  • peevee - Tuesday, December 11, 2018 - link

    "Vector performance depends on the vector width, number of vector units and latency of the units. Cortex-A76 has 2 128-bit FMA units "

    And yet NEON registers are 128-bit, while AVX2 is 256 bits and AVX-512, obviously, 512.

    "The size of the instruction set is completely irrelevant to emulation speed."

    What is your proof? If you need 3 4-byte instructions to emulate one 5-byte instuction with memory/stack arguments, it affects efficiency of I-cache at the very minimum. You need 4 instructions just to load a 64-bit constant into a register for God's sake!

    BTW, I don't defend x64_AVX-512 instruction set, it is a total disaster at this point and needs to be killed ASAP. But A64 has its disadvantages too, including the fixed instruction length, which surely saved a few transistors in the beginning of the 80s when it mattered, but now costs way more in I-cache and buses than it saves in decoder, and such savings don't matter compared to tens of millions of transistors needed to implement v8.3 (incl NEON) even at the lowest performance.

    " JIT compilation is a solved problem - both startup and runtime overhead is small, typically less than 2x"

    It is one or another - faster compilation means lower optimization and vice versa.
    "AArch64 codesize is smaller than x64. x64 averages ~4.5 bytes per instruction nowadays."

    But A64 needs several instructions per one Intel's as it does not have any memory operations except for load&store.
  • Wilco1 - Tuesday, December 11, 2018 - link

    "And yet NEON registers are 128-bit, while AVX2 is 256 bits and AVX-512, obviously, 512."

    The point is wider is not automatically better. Fast, low latency vector operations win against a wide but slow unit.

    ""The size of the instruction set is completely irrelevant to emulation speed."

    What is your proof? If you need 3 4-byte instructions to emulate one 5-byte instuction with memory/stack arguments, it affects efficiency of I-cache at the very minimum. You need 4 instructions just to load a 64-bit constant into a register for God's sake!"

    The proof is obvious - all those complex CISC instructions are never used. In fact AArch64 requires *fewer* instructions than x64 on typical tasks. You can try this yourself, disassemble some large binary, and count how few complex instructions are generated by compilers.

    "" JIT compilation is a solved problem - both startup and runtime overhead is small, typically less than 2x"

    It is one or another - faster compilation means lower optimization and vice versa."

    Again it's a solved problem. Ever used a browser, Java or .Net application??? Only functions that execute need to be compiled, and you typically start with a quick translation and optimize it if it is executed often (obviously in the background on a different core). So no there is effectively no tradeoff.

    ""AArch64 codesize is smaller than x64. x64 averages ~4.5 bytes per instruction nowadays."

    But A64 needs several instructions per one Intel's as it does not have any memory operations except for load&store."

    That's a common misconception. Again, disassemble some code and see for yourself. Those CISC memory operations are hardly ever used.
  • peevee - Monday, December 10, 2018 - link

    So what are the differences with 855? Same 4xA76 + 4xA55. Same Hexagon 690. Same X24 modem...

    8xPCIe? Adreno 680 vs 640 (why?). Spectra 390 vs 380 (what is the difference?)
  • Wilco1 - Monday, December 10, 2018 - link

    8 channel DRAM (likely 50-70GB/s), 10MB rather than 6MB cache, higher frequencies (4+4 rather than 1+3+4), larger TDP.
  • peevee - Tuesday, December 11, 2018 - link

    You can specify higher frequencies (leading to higher TDP) on the same chip.
  • Wilco1 - Tuesday, December 11, 2018 - link

    Yes but that doesn't get the 20% gain due to the better memory system. Running Windows with big applications (especially emulated) is more demanding than Android.
  • 123hpprintersetups - Monday, December 31, 2018 - link

    Excellent post. Please keep up the great work. You may check our website also <a href="https://123-hp-printer-setups.com/" rel="dofollow">123.hp.com</a> <a href="https://123-hp-printer-setups.com/" rel="dofollow">123.hp.com/setup</a>
  • mywifiext - Monday, December 31, 2018 - link

    Really quality information, thank you so much.
  • David1234 - Monday, January 7, 2019 - link

    Double the entire chip processor just by allowing a light wave frequency to be on or off when sending or receiving data, it all can be combined very easily if companies don't want big changes, 100 percent or almost unlimited GPU and CPU all in one tiny package.

Log in

Don't have an account? Sign up now