SMT4 is interesting, 60% speedup from 5% area. Makes me think SMT8 might just be worth exploring in the next generation, even if it is only 30% speedup from 10% area.
It’s been a while since I studied this, but saying it has 90MB L3 cache might be regarded as misleading. If you try stuffing your 50MB of code and data into cache, it’s going to fall over quite spectacularly. Each 4-core tile has only 6MB cache, so you have to keep your code and data to only 6MB or less if you want it to fit into cache. (Or fiddle around with splitting it between tiles). Better to say it has 15x6MB L3 cache (Still damn good) and leave it at that?
According to the slides linked above the L3 cache lines have no affinity for any specific core, so it really is a single 90MB L3, similar to how Intel's ring bus parts work.
You have to be careful about the marketing wording. What you said is what they want you to think - that any core can use any cache equally well - but a close reading indicates it could mean just 'any of the 4 cores in a single tile can use any of the 2x3MB caches associated with that tile'. Hence my use of 6MB.
Even worse - a single core might not be able to use all of both on-tile caches at the same time, so the real max cache for a core might be as low as 3MB. This chip is designed to be a multi-thread monster with up to 240 threads per die, each thread accessing its own part of cache. Not to have a tiny number of threads using up all the cache. That's a different layout.
It talks about the increase in area, but not power usage. Moreover you risk effectively reducing single-threaded speed unless you upgrade every other bit (and probably even then). If you limit them further it starts to look more and more like a GPU.
I think we need more evidence that people can use SMT4 effectively, as they have done eventually with Quad-Core and beyond, before it is increased further.
Both AMD and Intel are in trouble. The x86 market is shrinking fast.
AWS is deadset on transitioning to its own ARM server chips so they can reduce cost, build chips for their own needs, and have unique features. Google Cloud and Microsoft Azure will probably follow shortly in order to compete.
With Apple's transition to ARM, the x86 market shrank by 10% overnight. In addition, MacOS ARM will spur a renewed push for developers to optimize for ARM (Windows or Mac) on the laptop/desktop. This means Windows ARM will probably be an extremely viable option in the near future.
The x86 market isn't dead. But it's shrinking. AMD and Intel aren't just competing against each other, they're also competing with Apple, Qualcomm, Marvell, ARM, Nuvia, Ampere, Samsung, Huawei, Alibaba, etc.
ARM is always stupid custom, Apple trash is the least bothered part for Apple themselves as their Mac share of revenue is under 10%, that's why Apple did the move because instead of paying Intel and getting caught in the cheap VRM trash news they would benefit from the new hybrid OS of Mac which is an abomination to desktop UX. Apple transition shranked x86 marketshare overnight ? hahah, like Apple said themselves it's going to be a 2 year period so the products are still there. By that time you think AMD and Intel both DC centric companies sit and lay eggs ? ARM is dogshit, it's always fucking custom the massive support of x86 in Linux space is not there for ARM so transitioning to that is not possible at this point of time, x86 already is a RISC underneath it, so your yapping of Intel and AMD in trouble ain't true. They sure have to make sure they are innovating, Intel is going Big Little for the mobile space, AMD patented the same and they will go the same, and with massive Windows ecosystem this ARM bullshit failed on RT and x86 to ARM translation is not for free, you get performance penalty. Apple pays top companies like Adobe to make their Software like first party IP, it's not a major market.
AT spec scores out of that graphs are meaningless. When the work done by Snapdragon processors is equivalent to what A series do, just because the OS feels snappy due to tight integration and closed binaries is not a measure but real life work performance.
All those companies, none of them make Datacenter processors except Marvell and with Apple being only consumer centric company what the fuck are you blabbing here ? Datacenter market is owned by x86 not ARM, and Qualcomm left after pouring billions in their Centriq prized custom ARM IP where Cloudflare was heralding just like how again CF is now doing it again with Altera. AWS is going Graviton because it's always Amazon's one of their own type in house stuff to save money and it's not going to beat AMD which is the king of the DC with EPYC 7742. Icelake SP is also coming and Intel already probably locked out vendors from moving to AMD by now forget ARM.
ARM is again, full custom. Qualcomm is the only one which properly respects the OSS by CAF and Binaries, Blobs etc on Android space. Samsung is a failure, Huawei is utter trash in that dept. Nuvia is vaporware until real product and that's only for DC market, not Smartphones or ultra portables, which is where lot of money is there to start up quick. And on Windows it's utter trash and same for Linux, only low level workloads. Name one ARM processor based machine which can run anything like x86 Wintel machines ?
Better go back to your twitter and reeesetera and put more pronouns. And you don't even have any argument, you are stuck on that political comment. And you will be stuck there forever.
Your diatribe entirely missed the point of why people are moving to ARM-based processors for servers and other purposes. In can be summarized in two words : power consumption. Server farms can save a lot of money using ARM processors compared to the equivalent horsepower from just about any other processor available. They are not moving to them for any performance advantage.
Lower power also means less cooling, fewer power supplies and higher density. Additionally Arm servers need less silicon area to get the same performance, so upfront cost of server chips is lower too (you avoid paying extortionate prices like you do for many x86 server chips).
Two years ago you could reasonably have said "there is no plausible ARM server". A year ago you could legitimately have said "sure, there are ARM servers (TX2, Graviton) but they suck". This year the best you can say is "they offer no benefit over a modern AMD or Intel server" (actually already not true if you're buying compute from AWS).
You want to bet against this trajectory?
Next year? This was the year of matching x86 along important dimensions. Next year will be the year of exceeding x86 along important dimensions. Not ALL dimensions, that might take till 2022 or so, (there's still a reasonable amount of foundational work to do by all parties, like implementing SVE/2) but, as I said, the trajectory is clear.
I have to agree with this assessment. People keep counting ARM designs out because they've taken a long time to ramp up to this level, but every year they get closer to being a notable force in the market, and every year the naysayers find another, smaller reason to point to for they'll never be successful.
The simple truth is that ARM designs don't even have to beat x86 to take a slice of the market - they just have to offer *something*, be it cost benefits, lower idle power, improved security, or even just being an in-house design (a-la Amazon).
AWS transitioning makes sense for their own use, but they'll more than likely need to continue offering x86 for customers. Same goes for Google and Microsoft. Hard to predict how that will shake out at this juncture.
Apple aren't even close to 10% of the total x86 market, either - they're between 7.5% and 8% of the global *PC* market, which obviously doesn't include the server / datacentre market. That's still going to be a bit of a dent for Intel when the transition completes, but it's not nearly as bad for x86 on the whole as you're implying.
Competition is heating up, though. That's a good thing.
This illustrates why I think Nvidia wants to own Arm. They have already stated they are porting CUDA to the ARM instruction set. I think this is because they want to make a processor suited for HPC and it will be ARM-based and here's why. First, think of how their GPUs are organized. They use streaming multiprocessors with a whole bunch of little cores. These days they have 64 cores per SM so that is essentially 64-way SMT. The thing is these cores are very, very simple with many limitations. I think they want to use ARM-based cores with something like 16-way SMT. If they use AMD's multi-chip approach they could make an MCM with a thousand ARM cores in one package. There would be no CPU-GPU pairing as we often see today. One MCM could run the whole show. This would entirely eliminate the secondary data transfers to a co-processor and make for an incredibly fast super computer with relatively low power consumption. I think this architecture would be a huge improvement over what they have.
i just want to give quantumz0d credit for having the courage at expressing what is clearly the case. i'm tired of ARM trash being propounded by losers stuck in the stock market and trying to pump whenever they can.
i foresaw this years ago when i was propositioned with "entering the market" and doing an IPO. i steadfastly refused since i looked at it as a "credit aggregation scheme" in which the quality of any monetary "cashout" would be directly dependent on those buying in. and as far as i can see, that's not the best way to secure my future.
8 years later, my fears have been realised. the "market" has destroyed the enthusiast industry, where the latter has been enslaved by the former. all we hear about from today's "enthusiast" is how ARM processors are great, with these foolish expectations that x86 binaries can somehow be transitioned to ARM seamlessly. there is a lack of appreciation for both sides of the coin with today's enthusiast (learning the software side and the hardware side) and it is reflected by the lack of diverse offerings from the manufacturers.
in the words of one of my favourite people in the embedded space, ralph baechle (a huge contributor to MIPS), it was never foreseen that ARM would even go multicore (https://www.tldp.org/HOWTO/SMP-HOWTO-3.html).
in fact, it's hard enough to make a good multicore embedded processor (the SH4[A] is/was amasing, and stacking more cores introduces bigger challenges when you compare the physical restrictions of the embedded segment versus the desktop.
now, on to what you're saying Gomez (and originally the reason i wanted to post): i agree. for my line of work, an x86 or a good MIPS (Kfc, not just Kc) is an absolute necessity. i need larger shared memory and my work (10^5 dimension matrices that involve eigendecompositions) is not able to use "high core low cache+memory" designs such as nVidias (which has an API in MATLAB) or ARM.
i agree with you entirely. it would be very interesting from a GPU design standpoint if nVidia absorbed ARM. i would love to see what their 'shader units' would look like after getting more direction from ARM cores.
With Apple going to ARM, many desktop software will be ported to this architecture. So, in the next 5 years, I hope to see ARM workstations with powerfull GPUs, competing head-to-head with x86 based computers.
Windows already runs on ARM and Visual Studio can target ARM code generation. All it takes is a re-compilation. There are already ARM-powered GPUs available now. This site reviewed one recently. This is only the start.
Just 10 days after this announcement, the Marvell management seems to have realized there is no market for general purpose server grade ARM!. They pulled the rug under the feet of this team. It is funny because just stuffing more cores in the SoC doesn't win new customers in server market. For hyper scale customers the name of the game is performance per watt numbers. This Marvell team should have known this for long time yet they keep making these superficial announcements about how they can stuff so many cores in an SoC. Only less experienced people fall for that. The hyper scale customers know better.
Wow! That is impressive, thanks for charing with us and making it clear! Now I want to help you back, do you know that most of spouses are cheating? I`m sure that you know it and you know that they are keeping their secrets in their cell phones. Today I give you an opportunity to spy on them without accessing with https://topspying.com/spy-on-a-cell-phone-without-... So follow the instuctions and spy on anybody you want
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
27 Comments
Back to Article
Tomatotech - Monday, August 17, 2020 - link
SMT4 is interesting, 60% speedup from 5% area. Makes me think SMT8 might just be worth exploring in the next generation, even if it is only 30% speedup from 10% area.It’s been a while since I studied this, but saying it has 90MB L3 cache might be regarded as misleading. If you try stuffing your 50MB of code and data into cache, it’s going to fall over quite spectacularly. Each 4-core tile has only 6MB cache, so you have to keep your code and data to only 6MB or less if you want it to fit into cache. (Or fiddle around with splitting it between tiles). Better to say it has 15x6MB L3 cache (Still damn good) and leave it at that?
saratoga4 - Monday, August 17, 2020 - link
>Each 4-core tile has only 6MB cacheAccording to the slides linked above the L3 cache lines have no affinity for any specific core, so it really is a single 90MB L3, similar to how Intel's ring bus parts work.
Tomatotech - Monday, August 17, 2020 - link
You have to be careful about the marketing wording. What you said is what they want you to think - that any core can use any cache equally well - but a close reading indicates it could mean just 'any of the 4 cores in a single tile can use any of the 2x3MB caches associated with that tile'. Hence my use of 6MB.Even worse - a single core might not be able to use all of both on-tile caches at the same time, so the real max cache for a core might be as low as 3MB. This chip is designed to be a multi-thread monster with up to 240 threads per die, each thread accessing its own part of cache. Not to have a tiny number of threads using up all the cache. That's a different layout.
saratoga4 - Monday, August 17, 2020 - link
>but a close reading indicates it could mean just 'any of the 4 cores in a single tile can use any of the 2x3MB caches associated with that tile'.The slides say that individual tiles are striped, so I don't think your reading is correct.
Krysto - Tuesday, August 18, 2020 - link
Power10 is already doing SMT8.GreenReaper - Wednesday, August 19, 2020 - link
It talks about the increase in area, but not power usage. Moreover you risk effectively reducing single-threaded speed unless you upgrade every other bit (and probably even then). If you limit them further it starts to look more and more like a GPU.I think we need more evidence that people can use SMT4 effectively, as they have done eventually with Quad-Core and beyond, before it is increased further.
senttoschool - Tuesday, August 18, 2020 - link
Both AMD and Intel are in trouble. The x86 market is shrinking fast.AWS is deadset on transitioning to its own ARM server chips so they can reduce cost, build chips for their own needs, and have unique features. Google Cloud and Microsoft Azure will probably follow shortly in order to compete.
With Apple's transition to ARM, the x86 market shrank by 10% overnight. In addition, MacOS ARM will spur a renewed push for developers to optimize for ARM (Windows or Mac) on the laptop/desktop. This means Windows ARM will probably be an extremely viable option in the near future.
The x86 market isn't dead. But it's shrinking. AMD and Intel aren't just competing against each other, they're also competing with Apple, Qualcomm, Marvell, ARM, Nuvia, Ampere, Samsung, Huawei, Alibaba, etc.
Quantumz0d - Tuesday, August 18, 2020 - link
LOLARM is always stupid custom, Apple trash is the least bothered part for Apple themselves as their Mac share of revenue is under 10%, that's why Apple did the move because instead of paying Intel and getting caught in the cheap VRM trash news they would benefit from the new hybrid OS of Mac which is an abomination to desktop UX. Apple transition shranked x86 marketshare overnight ? hahah, like Apple said themselves it's going to be a 2 year period so the products are still there. By that time you think AMD and Intel both DC centric companies sit and lay eggs ? ARM is dogshit, it's always fucking custom the massive support of x86 in Linux space is not there for ARM so transitioning to that is not possible at this point of time, x86 already is a RISC underneath it, so your yapping of Intel and AMD in trouble ain't true. They sure have to make sure they are innovating, Intel is going Big Little for the mobile space, AMD patented the same and they will go the same, and with massive Windows ecosystem this ARM bullshit failed on RT and x86 to ARM translation is not for free, you get performance penalty. Apple pays top companies like Adobe to make their Software like first party IP, it's not a major market.
AT spec scores out of that graphs are meaningless. When the work done by Snapdragon processors is equivalent to what A series do, just because the OS feels snappy due to tight integration and closed binaries is not a measure but real life work performance.
All those companies, none of them make Datacenter processors except Marvell and with Apple being only consumer centric company what the fuck are you blabbing here ? Datacenter market is owned by x86 not ARM, and Qualcomm left after pouring billions in their Centriq prized custom ARM IP where Cloudflare was heralding just like how again CF is now doing it again with Altera. AWS is going Graviton because it's always Amazon's one of their own type in house stuff to save money and it's not going to beat AMD which is the king of the DC with EPYC 7742. Icelake SP is also coming and Intel already probably locked out vendors from moving to AMD by now forget ARM.
ARM is again, full custom. Qualcomm is the only one which properly respects the OSS by CAF and Binaries, Blobs etc on Android space. Samsung is a failure, Huawei is utter trash in that dept. Nuvia is vaporware until real product and that's only for DC market, not Smartphones or ultra portables, which is where lot of money is there to start up quick. And on Windows it's utter trash and same for Linux, only low level workloads. Name one ARM processor based machine which can run anything like x86 Wintel machines ?
HVAC - Tuesday, August 18, 2020 - link
Utter ... trash ...Greys - Wednesday, August 26, 2020 - link
I am agree with you!Spunjji - Wednesday, August 19, 2020 - link
Good to know your opinions on the future of the CPU market are just as balanced, nuanced and well-informed as your political ramblings...Quantumz0d - Wednesday, August 19, 2020 - link
Better go back to your twitter and reeesetera and put more pronouns.And you don't even have any argument, you are stuck on that political comment. And you will be stuck there forever.
Gomez Addams - Wednesday, August 19, 2020 - link
Your diatribe entirely missed the point of why people are moving to ARM-based processors for servers and other purposes. In can be summarized in two words : power consumption. Server farms can save a lot of money using ARM processors compared to the equivalent horsepower from just about any other processor available. They are not moving to them for any performance advantage.Wilco1 - Thursday, August 20, 2020 - link
Lower power also means less cooling, fewer power supplies and higher density. Additionally Arm servers need less silicon area to get the same performance, so upfront cost of server chips is lower too (you avoid paying extortionate prices like you do for many x86 server chips).eek2121 - Tuesday, August 18, 2020 - link
The x86 market is not shrinking. This server offers no benefits over a modern AMD or Intel server.name99 - Tuesday, August 18, 2020 - link
Two years ago you could reasonably have said "there is no plausible ARM server".A year ago you could legitimately have said "sure, there are ARM servers (TX2, Graviton) but they suck".
This year the best you can say is "they offer no benefit over a modern AMD or Intel server" (actually already not true if you're buying compute from AWS).
You want to bet against this trajectory?
Next year? This was the year of matching x86 along important dimensions. Next year will be the year of exceeding x86 along important dimensions. Not ALL dimensions, that might take till 2022 or so, (there's still a reasonable amount of foundational work to do by all parties, like implementing SVE/2) but, as I said, the trajectory is clear.
Spunjji - Wednesday, August 19, 2020 - link
I have to agree with this assessment. People keep counting ARM designs out because they've taken a long time to ramp up to this level, but every year they get closer to being a notable force in the market, and every year the naysayers find another, smaller reason to point to for they'll never be successful.The simple truth is that ARM designs don't even have to beat x86 to take a slice of the market - they just have to offer *something*, be it cost benefits, lower idle power, improved security, or even just being an in-house design (a-la Amazon).
Spunjji - Wednesday, August 19, 2020 - link
Your last statement doesn't follow from - or lead to - the first one.Spunjji - Wednesday, August 19, 2020 - link
Shrinking? Sure, eventually. Fast? Not so sure.AWS transitioning makes sense for their own use, but they'll more than likely need to continue offering x86 for customers. Same goes for Google and Microsoft. Hard to predict how that will shake out at this juncture.
Apple aren't even close to 10% of the total x86 market, either - they're between 7.5% and 8% of the global *PC* market, which obviously doesn't include the server / datacentre market. That's still going to be a bit of a dent for Intel when the transition completes, but it's not nearly as bad for x86 on the whole as you're implying.
Competition is heating up, though. That's a good thing.
Gomez Addams - Tuesday, August 18, 2020 - link
This illustrates why I think Nvidia wants to own Arm. They have already stated they are porting CUDA to the ARM instruction set. I think this is because they want to make a processor suited for HPC and it will be ARM-based and here's why. First, think of how their GPUs are organized. They use streaming multiprocessors with a whole bunch of little cores. These days they have 64 cores per SM so that is essentially 64-way SMT. The thing is these cores are very, very simple with many limitations. I think they want to use ARM-based cores with something like 16-way SMT. If they use AMD's multi-chip approach they could make an MCM with a thousand ARM cores in one package. There would be no CPU-GPU pairing as we often see today. One MCM could run the whole show. This would entirely eliminate the secondary data transfers to a co-processor and make for an incredibly fast super computer with relatively low power consumption. I think this architecture would be a huge improvement over what they have.McCartney - Tuesday, August 18, 2020 - link
i just want to give quantumz0d credit for having the courage at expressing what is clearly the case. i'm tired of ARM trash being propounded by losers stuck in the stock market and trying to pump whenever they can.i foresaw this years ago when i was propositioned with "entering the market" and doing an IPO. i steadfastly refused since i looked at it as a "credit aggregation scheme" in which the quality of any monetary "cashout" would be directly dependent on those buying in. and as far as i can see, that's not the best way to secure my future.
8 years later, my fears have been realised. the "market" has destroyed the enthusiast industry, where the latter has been enslaved by the former. all we hear about from today's "enthusiast" is how ARM processors are great, with these foolish expectations that x86 binaries can somehow be transitioned to ARM seamlessly. there is a lack of appreciation for both sides of the coin with today's enthusiast (learning the software side and the hardware side) and it is reflected by the lack of diverse offerings from the manufacturers.
in the words of one of my favourite people in the embedded space, ralph baechle (a huge contributor to MIPS), it was never foreseen that ARM would even go multicore (https://www.tldp.org/HOWTO/SMP-HOWTO-3.html).
in fact, it's hard enough to make a good multicore embedded processor (the SH4[A] is/was amasing, and stacking more cores introduces bigger challenges when you compare the physical restrictions of the embedded segment versus the desktop.
now, on to what you're saying Gomez (and originally the reason i wanted to post): i agree. for my line of work, an x86 or a good MIPS (Kfc, not just Kc) is an absolute necessity. i need larger shared memory and my work (10^5 dimension matrices that involve eigendecompositions) is not able to use "high core low cache+memory" designs such as nVidias (which has an API in MATLAB) or ARM.
i agree with you entirely. it would be very interesting from a GPU design standpoint if nVidia absorbed ARM. i would love to see what their 'shader units' would look like after getting more direction from ARM cores.
Spunjji - Wednesday, August 19, 2020 - link
Courage? For posting an ill-informed, barely-grammatical rant that didn't come close to a rational argument? Okay... 🤪Based on the rambling off-topic content of your post, it's hard to tell whether you're a z0d sockpuppet or just equally delusional.
mkanada - Wednesday, August 19, 2020 - link
With Apple going to ARM, many desktop software will be ported to this architecture. So, in the next 5 years, I hope to see ARM workstations with powerfull GPUs, competing head-to-head with x86 based computers.Gomez Addams - Friday, August 21, 2020 - link
Windows already runs on ARM and Visual Studio can target ARM code generation. All it takes is a re-compilation. There are already ARM-powered GPUs available now. This site reviewed one recently. This is only the start.Rudde - Friday, August 21, 2020 - link
"8-wife fetch unit"Now I'm intrigued.
Industry_veteran - Saturday, August 29, 2020 - link
Just 10 days after this announcement, the Marvell management seems to have realized there is no market for general purpose server grade ARM!. They pulled the rug under the feet of this team.It is funny because just stuffing more cores in the SoC doesn't win new customers in server market.
For hyper scale customers the name of the game is performance per watt numbers.
This Marvell team should have known this for long time yet they keep making these superficial announcements about how they can stuff so many cores in an SoC. Only less experienced people fall for that. The hyper scale customers know better.
pawder - Monday, September 14, 2020 - link
Wow! That is impressive, thanks for charing with us and making it clear! Now I want to help you back, do you know that most of spouses are cheating? I`m sure that you know it and you know that they are keeping their secrets in their cell phones. Today I give you an opportunity to spy on them without accessing with https://topspying.com/spy-on-a-cell-phone-without-... So follow the instuctions and spy on anybody you want