This should be called "a paper launch of a paper launch". To provide a perspective, Intel announced Nervana NNP-I accelerator back in 2017, saying "first silicon will be shipped by the end of the year". Still not out.
I imagine one of the challenges is that this is a really fast moving target. In 2017 this was being done in FP32 and shifting into 16 bits. But chips for that are already obsolete. Training in 8 bits, inference in 4 bits. Releasing a new silicon family that can't do that might be a waste of investment, even if you have already finalized.
The good thing is, it sounds like those are going to be the limits at least for a while.
The problem is you can't just build a chip that only supports 8 bit for training and 4 bit for inference. There will always be models/tasks which either need higher precision, or work well with lower precision. DL hardware must remain flexible (with support for at least 16 bits). Note that latest Nvidia chips support precision all the way to INT1, with linear scaling in performance.
Oh, I agree. I would expect that 32-bit math to still be there. My point is that the difficulty in selling products that can’t do 4-bit calculations might explain why some previously announced products might be delayed for release.
There are probably other factors - instruction set, memory models, and communication protocols - that also need to try to keep up with this quickly advancing field.
Also, developing an inference-only chip for the market 2-4 years from now is risky, because as you said, ML field is a fast moving target. What if online learning becomes popular/effective (models that continuously learns from real-time data, e.g. [1])? And maybe you need 16 bit precision to make it work (maybe not, but what if), and these chips just don't support it.
Intel at least has AI silicon already available from the their purchase. This is Qualacomm's first dedicated silicon and it's occuring after the activist investor got them to drop the server chip because it cost too much to develop. Unless they show real silicon I'd take any promises of future silicon with a truckload of salt.
It's perhaps more accurate to say that Qualcomm wants *everyone else* to buy into the idea of datacenter inference accelerators. Something tells me revenue won't quite meet up with projections!
Fully agree. Since inference doesn't take much calculation power at all I even fail to see the need for these. Ar ethere apps that run models in the cloud? then why do we get AI hardware on phones? And stuff like mobilenet?
My thought on this is: Inference is best done where the data is.
e.g. probably do not want to ship an up to date product catalogue network to every users mobile device to make a recommendation. Search network may be large.
Give qualacom some credit, they did put full their full weight behind CentriQ and even had first run tapeout done when the activist investor made them drop it. Before that happened I would have put high odds of CentriQ coming out and being the first real ARM server chip, but with the activist investor involved I wouldn't be betting on anything they do until the activist sells their positon. Management is totally handcuffed while the hedge fund is involved and they won't be spending any money on developing any new products beyond white papers.
Nice analogy there. I think the most generous possible answer is that they all do somewhat different things making straight comparisons difficult - to maintain your analogy, car vs. truck vs. amphibious APC.
In reality it's probably because these are all paper projections and they have *no idea* how their product will actually compete, whether that be in outright performance, performance-per-watt, or performance-per-dollar.
Both comparisons are valid, since there is also the debate as to where inferencing will take place. Apple, for instance, is aggressively promoting that it be done on device and all new phone SOCs have accelerators.
There will obviously be a huge number of use cases, but that will always be a question for each developer, what can be done and where, on device, on an edge node, or in the datacenter.
I agree, if this was coming to the Snapdragon 865 (and my next phone) I'd say great!
If this is supposed to be coming to a server farm it would be ho-hum, as for my desktop I'm not going to stick a half dozen of these cards in it. Sadly now is the time to wait and see, if it's not earning you money today, next year will be > 10x again; suffer with the current tech and skip the incremental improvements.
Everybody is coming out with an AI accelerator with claims of one-of-a-kind, fantastic features. Tesla, Bitmain, Qualcomm, Google, Facebook, Wave, Graphcore, Nervana, etc. It must be really easy to do.
Late to the game. Good luck to QC, by the time they figure out which floating point math is good for these accelerators the algorithms would have completely changed ..
The best place to be for any AI is as close to the brain it attempts to serve or extend. That means hand or pocket until they implant them.
So for Qualcomm it’s simply a matter of survival: Either you’re there on the money-making side of AI (inference), or you’re you’ll be run over by those who understood that better.
And inference is on every step from the front-most device to the data-center core: Sharing that market with anyone else means you’re an easy target for economic warfare.
Everybody understands training is a different ballgame, but also that there is significant (market) power in being able to transform trained models to something efficient for inference at mobile and edge power envelopes. So sooner or later things might get a little ugly between those who build the costly training platforms and those who just want to make money on inference.
But for now Qualcomm is making a play it rightfully judges as crucial while the Huawei IMHO deserves an honorable mention for understanding that earlier.
Sorry for my ignorance, but what the h-ll is an "Ai Inference Accelerator"? What does it do? What type of Ai are we talking about, what can we see as consumers as an outcome of it being used. A graphics accelerator accelerate 3d-graphics, making 3d animation faster, better looking and more fluid. An "Ai inference accelerator" accelerates....what? and makes exactly what better that we can experience. Reading the whole article and absolutely no background whatsoever...
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
26 Comments
Back to Article
p1esk - Tuesday, April 9, 2019 - link
This should be called "a paper launch of a paper launch". To provide a perspective, Intel announced Nervana NNP-I accelerator back in 2017, saying "first silicon will be shipped by the end of the year". Still not out.p1esk - Tuesday, April 9, 2019 - link
https://www.anandtech.com/show/11942/intel-shippin...The Hardcard - Tuesday, April 9, 2019 - link
I imagine one of the challenges is that this is a really fast moving target. In 2017 this was being done in FP32 and shifting into 16 bits. But chips for that are already obsolete. Training in 8 bits, inference in 4 bits. Releasing a new silicon family that can't do that might be a waste of investment, even if you have already finalized.The good thing is, it sounds like those are going to be the limits at least for a while.
p1esk - Tuesday, April 9, 2019 - link
The problem is you can't just build a chip that only supports 8 bit for training and 4 bit for inference. There will always be models/tasks which either need higher precision, or work well with lower precision. DL hardware must remain flexible (with support for at least 16 bits). Note that latest Nvidia chips support precision all the way to INT1, with linear scaling in performance.The Hardcard - Wednesday, April 10, 2019 - link
Oh, I agree. I would expect that 32-bit math to still be there. My point is that the difficulty in selling products that can’t do 4-bit calculations might explain why some previously announced products might be delayed for release.There are probably other factors - instruction set, memory models, and communication protocols - that also need to try to keep up with this quickly advancing field.
p1esk - Wednesday, April 10, 2019 - link
Also, developing an inference-only chip for the market 2-4 years from now is risky, because as you said, ML field is a fast moving target. What if online learning becomes popular/effective (models that continuously learns from real-time data, e.g. [1])? And maybe you need 16 bit precision to make it work (maybe not, but what if), and these chips just don't support it.p1esk - Wednesday, April 10, 2019 - link
https://arxiv.org/abs/1903.08671rahvin - Tuesday, April 9, 2019 - link
Intel at least has AI silicon already available from the their purchase. This is Qualacomm's first dedicated silicon and it's occuring after the activist investor got them to drop the server chip because it cost too much to develop. Unless they show real silicon I'd take any promises of future silicon with a truckload of salt.GreenReaper - Tuesday, April 9, 2019 - link
It's perhaps more accurate to say that Qualcomm wants *everyone else* to buy into the idea of datacenter inference accelerators. Something tells me revenue won't quite meet up with projections!beginner99 - Wednesday, April 10, 2019 - link
Fully agree. Since inference doesn't take much calculation power at all I even fail to see the need for these. Ar ethere apps that run models in the cloud? then why do we get AI hardware on phones? And stuff like mobilenet?jayfang - Wednesday, April 24, 2019 - link
My thought on this is: Inference is best done where the data is.e.g. probably do not want to ship an up to date product catalogue network to every users mobile device to make a recommendation. Search network may be large.
levizx - Tuesday, April 9, 2019 - link
CentriQ has given Qualcomm a bad rep already - they just give up without even trying. I doubt anyone would buy into their eco-system.rahvin - Tuesday, April 9, 2019 - link
Give qualacom some credit, they did put full their full weight behind CentriQ and even had first run tapeout done when the activist investor made them drop it. Before that happened I would have put high odds of CentriQ coming out and being the first real ARM server chip, but with the activist investor involved I wouldn't be betting on anything they do until the activist sells their positon. Management is totally handcuffed while the hedge fund is involved and they won't be spending any money on developing any new products beyond white papers.tuxRoller - Tuesday, April 9, 2019 - link
TPUv2 and up perform both training and inference.webdoctors - Wednesday, April 10, 2019 - link
Why no comparison to other datacenter inferencing cards like Google's or Nvidias? Why use the Snapdragon phone SoCs?If I manufacture bikes and decide to make a car, am I going to compare it to other cars or to the previous bikes I sold? Who would find this useful?
Spunjji - Wednesday, April 10, 2019 - link
Nice analogy there. I think the most generous possible answer is that they all do somewhat different things making straight comparisons difficult - to maintain your analogy, car vs. truck vs. amphibious APC.In reality it's probably because these are all paper projections and they have *no idea* how their product will actually compete, whether that be in outright performance, performance-per-watt, or performance-per-dollar.
The Hardcard - Wednesday, April 10, 2019 - link
Both comparisons are valid, since there is also the debate as to where inferencing will take place. Apple, for instance, is aggressively promoting that it be done on device and all new phone SOCs have accelerators.There will obviously be a huge number of use cases, but that will always be a question for each developer, what can be done and where, on device, on an edge node, or in the datacenter.
Rοb - Saturday, April 27, 2019 - link
I agree, if this was coming to the Snapdragon 865 (and my next phone) I'd say great!If this is supposed to be coming to a server farm it would be ho-hum, as for my desktop I'm not going to stick a half dozen of these cards in it. Sadly now is the time to wait and see, if it's not earning you money today, next year will be > 10x again; suffer with the current tech and skip the incremental improvements.
Yojimbo - Wednesday, April 10, 2019 - link
Everybody is coming out with an AI accelerator with claims of one-of-a-kind, fantastic features. Tesla, Bitmain, Qualcomm, Google, Facebook, Wave, Graphcore, Nervana, etc. It must be really easy to do.deil - Wednesday, April 10, 2019 - link
50x faster than a phone. how does it hold against epyc ?tuxRoller - Saturday, April 13, 2019 - link
Massively more efficient per top.anivarti - Wednesday, April 10, 2019 - link
Late to the game. Good luck to QC, by the time they figure out which floating point math is good for these accelerators the algorithms would have completely changed ..ballsystemlord - Wednesday, April 10, 2019 - link
@ryanWhat platforms will the SW stack run on?
Will it be monolithic binaries, or will they provide source code?
Ryan Smith - Thursday, April 11, 2019 - link
This is all for datacenter use. So Linux is a given. Beyond that, QC hasn't shared any further details.abufrejoval - Thursday, April 11, 2019 - link
The best place to be for any AI is as close to the brain it attempts to serve or extend. That means hand or pocket until they implant them.So for Qualcomm it’s simply a matter of survival: Either you’re there on the money-making side of AI (inference), or you’re you’ll be run over by those who understood that better.
And inference is on every step from the front-most device to the data-center core: Sharing that market with anyone else means you’re an easy target for economic warfare.
Everybody understands training is a different ballgame, but also that there is significant (market) power in being able to transform trained models to something efficient for inference at mobile and edge power envelopes. So sooner or later things might get a little ugly between those who build the costly training platforms and those who just want to make money on inference.
But for now Qualcomm is making a play it rightfully judges as crucial while the Huawei IMHO deserves an honorable mention for understanding that earlier.
Magnus101 - Monday, April 15, 2019 - link
Sorry for my ignorance, but what the h-ll is an "Ai Inference Accelerator"?What does it do?
What type of Ai are we talking about, what can we see as consumers as an outcome of it being used.
A graphics accelerator accelerate 3d-graphics, making 3d animation faster, better looking and more fluid.
An "Ai inference accelerator" accelerates....what? and makes exactly what better that we can experience.
Reading the whole article and absolutely no background whatsoever...