Comments for Google’s Tensor Processing Unit: What We Know

Google’s Tensor Processing Unit: What We Know

by Joshua Ho on 5/20/2016 6:00 AM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

39 Comments

Back to Article

nathanddrews - Friday, May 20, 2016 - link
So is it basically a really fast, but really dumb CPU?
HollyDOL - Friday, May 20, 2016 - link
It sounds a bit like being an offspring of Pentium III and a Bitcoin ASIC. But I can be completely wrong. Until Google provides info it's pure best guessing...
nathanddrews - Friday, May 20, 2016 - link
https://cloudplatform.googleblog.com/2016/05/Googl...
Krysto - Friday, May 20, 2016 - link
I would think the architecture is a lot simpler than a Pentium 3. It should be a lot closer to a Bitcoin ASIC.
JoshHo - Friday, May 20, 2016 - link
That doesn't make a lot of sense. Pentium III is a CPU core with out of order execution and speculative execution, Bitcoin ASICs are basically just dedicated silicon for SHA256.

Machine learning really needs a huge amount of SIMD in order to enable faster matrix multiplication, with some general purpose instructions so you don't have to involve the CPU just to evaluate whether you need to break out of a loop. Static scheduling also means that the compiler is deciding how to schedule operations rather than relying on hardware to do so dynamically.
HollyDOL - Monday, May 23, 2016 - link
Sorry for confusion, it was rather ment literally, not absolutely. I was not referring to internal architecture or so... just SIMD in P3 and one-purpose of bitcoin ASIC.
ddriver - Friday, May 20, 2016 - link
It is basically a special / dedicated purpose processor, much better at doing that one thing for the power and transistor budget. A lot of fuss about nothing, which has become standard practice nowadays.

"deep learning" has very shallow hardware requirements, a traditional processor architecture is far more capable, which in this particular task would be wasted capacity. It is basically like driving a semi truck to the corner store to get a pack of cigarettes.
name99 - Friday, May 20, 2016 - link
The "fuss" is because it's one more step in the slow decline of Intel.
Every large computational task that gets moved to custom silicon is a slice of the data center/server business for which Intel is no longer obligatory; and that data center/server business is the only place Intel makes money these days.

The Intel fans insist this doesn't matter, and will tell you how nothing can replace a fast low-latency CPU for certain types of server and business tasks. Which is all true, but it misses the point.
Once upon a time Intel was a "computation" company, whose business grew as computation grew. Then they fscked up their mobile strategy (something I've discussed in great detail elsewhere) and they shrank from a computation company to a "high performance computation" company --- and stayed basically flat as the greatest expansion of computation in human history happened in mobile.
Now, as certain types of computation move first to GPUs, then to even more specialized chips, they are shrinking from a "high performance computation" company to a "low-latency CPUs" company, and once again business will stay flat, even as the revenue that goes into GPUs and these specialized chips explodes.
And, soon enough (starting probably next year, really taking off around 2020) we'll see the ARM server CPUs targeting high frequencies and high IPC, and Intel will shrink further. to the "x86 CPUs" category, where their only real selling point is the ability to run dusty decks --- basically the same place IBM zSeries is today.

That is why TPU matters. because it's the third step (after mobile, and after GPUs) in the shrinking relevance of Intel to the entire spectrum of computation.
name99 - Friday, May 20, 2016 - link
Hah. Great minds think alike!
I hadn't read that Wired article ( http://www.wired.com/2016/05/googles-making-chips-... )
when I wrote the above, but yeah, they mostly see things the same way I do...
ddriver - Friday, May 20, 2016 - link
Saying that in any context other than a joke is just sad...

How does this hurt intel? Why should intel be worried or even care? It doesn't compete with intel's products, couldn't even if it wanted to. There is nothing preventing intel from manufacturing special purpose processors if they wanted to. As I already mentioned, google only develop this because they need it to comb through people's personal information they've mined over the years. Intel doesn't need that, google need that, thus they make a special purpose chip they can't buy, cuz barely anyone needs it. Google will not, and could not compete with intel even if it wanted to, they don't have the experience, resources and know-how of intel. The thing is that intel won't go into anything that doesn't promise high profit margins - that's why their mobile device business ain't going too well - intel is not very enthusiastic about low margin markets.

Don't be such a tool, buying into sensationalist article titles. Those chips will serve one purpose alone - assist google in capitalizing on people's data, they have exabytes of data collected, and they need the most efficient tools to process it.
easp - Saturday, May 21, 2016 - link
How does this hurt intel? By not helping Intel. By running stuff that was previously run on intel CPUs. By running more and more of that stuff. By being another growing semiconductor market that Intel doesn't have any competitive advantage in. Meanwhile, to stay competitive in their core markets, intel needs to invest in new fabs, and those new fabs turn out 2x as many tranistors as the old fabs, and intel really doesn't have a market for all those transistors any more.
Michael Bay - Sunday, May 22, 2016 - link
You of all people should know well that name99 is a completely crazy pro-Apple and contra-Intel cultist.
stephenbrooks - Friday, May 20, 2016 - link
I read your comment about the "slow decline of Intel" immediately after this other article: http://anandtech.com/show/10324/price-check-q2-may...

Well, you did say it was slow. Guess I've just got to keep waiting... for that slow decline...
jjj - Friday, May 20, 2016 - link
Apparently Google confirmed that it's 8-bit and it might be 8-bit integer.
ShieTar - Friday, May 20, 2016 - link
Which makes sense for something focusing on image recognition and similar tasks. The CogniMem CM1K also uses 8-bit:

http://www.cognimem.com/products/chips-and-modules...
stephenbrooks - Friday, May 20, 2016 - link
8-bit = string data? Makes sense for Google the web company. Maybe it has acceleration for processing UTF-8 strings?
xdrol - Saturday, May 21, 2016 - link
I'd rather go with images. A 16M color image is only 3x 8 bit channels.
ShieTar - Sunday, May 22, 2016 - link
Though it is very likely that you can use very similar processing for both images and texts. The information is perceived differently by the human brain, but for tasks like pattern recognition it should make no difference if you search for a face in an image or for a phrase in a text.
easp - Saturday, May 21, 2016 - link
Look up TensorFlow. Neural nets. Not UTF-8 strings. Jeeze.
Jon Tseng - Friday, May 20, 2016 - link
I'd assume given the form factor and the apparent passive cooling it's not a massive chip/performance monster along the lines of an NVIDIA GPU. Although I would concede ASIC is more power efficient which does help things.
ddriver - Friday, May 20, 2016 - link
It is not apparent that cooling will be passive. What's apparent is this was designed for server rackmount 1u chassis which do not leave enough height to mount fans on top of radiators, instead they blow air through the entire chassis.

The card radiator is designed to be cooled by such an airflow, and looking at the power circuit components and the radiator itself, it is around 150 watts.
sciwizam - Friday, May 20, 2016 - link
According to Urs Holzle, Google's head of datacenters,

"..he said that Google would be releasing a paper describing the benefits of its chip and that Google will continue to design new chips that handle machine learning in other ways. Eventually, it seems, this will push GPUs out of the equation. “They’re already going away a little,” Hölzle says. “The GPU is too general for machine learning. It wasn’t actually built for that.”

http://www.wired.com/2016/05/googles-making-chips-...
Qwertilot - Friday, May 20, 2016 - link
It'll be fun to see where it all goes :)

That big Pascal thing does go a fair way towards this of course, and this is actually a potentially big enough market that you could imagine them ultimately doing more or less dedicated chips for it. Probably other people too.
surt - Friday, May 20, 2016 - link
Seems pretty clear that Google now has a superintelligent AI in operation. Humorous watching all the puppets dance for it, but also scary of course. Will be interesting to see whether it maintains a beneficent stance once it has sufficient robotic independence in the physical world.
ddriver - Friday, May 20, 2016 - link
Like with everything else that came out of technology, it will be employed into turning humans into more efficient milking cattle. Google have collected all sorts of data for decades, now they want the hardware to efficiently make something out of it. They intend to make on it so much money that their advertising business will look like a joke next to it.
ddriver - Friday, May 20, 2016 - link
"Seems pretty clear that Google now has a superintelligent AI in operation"

I highly doubt they have that. What they have is exabytes of people's personal, private and public data and the intent to comb through that for anything anyone is willing to pay for. The big winners will be google, governments, banks and corporations, and their gains will come from the one possible source - the general population.
Murloc - Friday, May 20, 2016 - link
which is enjoying internet services and the economic and personal benefits of these like no generation before, and it's all for this price.
And they willingly submit to this.
They can use a dumbphone and duckduckgo and run their e-mail server or whatever if they have a problem.
ddriver - Saturday, May 21, 2016 - link
which is saving you the effort of basic thought to enable and ease the loss of that ability completely, judging by your comment you are already there, at this point you need them to tell you what to do and believe :)
ddriver - Saturday, May 21, 2016 - link
you will really be enjoying it when some mediocre, 5$ worth of ai makes your job obsolete and renders you entirely useless, you will have such a blast enjoying internet services and the economic and personal benefits while you dig through the trash to survive, who knows maybe they will launch a "dumpster digger" service to inform the likes of you of the location and content of various dumpsters
Murloc - Friday, May 20, 2016 - link
it's as easy as pulling a lever to turn it off.
ddriver - Saturday, May 21, 2016 - link
you really have no idea, do you
WolfpackN64 - Friday, May 20, 2016 - link
It's it's really VLIW like, it's basically a smaller and more numerous cousin to the Elbrus 2K chips?
JoshHo - Friday, May 20, 2016 - link
It could be but unlike the Elbrus VLIW CPUs it wouldn't even be attempting to run general purpose code, so any concessions made to improve performance like doing something better to handle unpredictable branches, non-deterministic memory latency, virtual memory, syscalls, or anything else that a CPU would expect to handle.

When you can get a normal CPU to handle those tasks, you can afford to focus on the essentials for maximizing performance. I doubt the TPU has a TLB or much in the way of advanced prefetch mechanisms or similar things seen in CPUs or GPUs to handle unpredictable code.
Heavensrevenge - Friday, May 20, 2016 - link
It's most likely the "Lanai" in-house cpu-ish thing they patched LLVM for a few months ago https://www.phoronix.com/scan.php?page=news_item&a...
name99 - Friday, May 20, 2016 - link
Lanai is almost certainly something very different; a processor specialized for running network switching, routing, and suchlike.
Very different architecture specialized for a very different task --- but just as relevant to my larger point above about the twilight of Intel.
kojenku - Friday, May 20, 2016 - link
First, this chip is an ASIC. It performs a set of static functions therefore not modifiable. Second, since it is an ASIC, its precision is limited (it does not need to adjust the angle of a rocket's nozzle) and Google has mentioned this in the party. Third, since its precision is limited, it does not require lots of transistors in the chip therefore not very power hungry. This actually has been extensively studied by the academic researchers for a long time, and lots of research papers and experiments are available around the internet (Stochastic Computing, Imprecise computations, and etc.). Fourth, since it is imprecise and power efficient, the regular PC chip designs will gradually moving to the similar directions. What this means is that we will have PC CPUs making imprecise numeric presentations (1.0 vs. 0.99999999999999999999999999999912345).
extide - Monday, May 23, 2016 - link
Well, you know all chips are ASIC's .. I mean technically even an FPGA is an ASIC, but that IS modifiable. GPU's are ASIC's, CPU's are ASIC's. Being an ASIC has NOTHING to do with it's precision. They went with 8-bit as a design decision because it was all that they needed. 8-bit does mean the SIMD units will be smaller, and thus they can pack more of them in for the same transistor budget.

No, regular CPU's will not start to have less precision, regular CPU's (like x86, or ARM) have to conform to the ISA that they are designed for and that ISA will specify those things and you have to follow that spec exactly or it will not be able to run code compiled for that ISA.

ASIC - Application Specific Integrated Circuit
SIMD - Single Instruction Multiple Data
ISA - Instruction Set Architecture
FPGA - Field Programmable Gate Array
Jaybus - Saturday, May 21, 2016 - link
The naming of the device could be a clue. A tensor is a geometric object describing a relationship between vectors, scalars, or other tensors. For example, a linear map represented by a NxM matrix that transforms N M-dimensional vectors into a single N-dimensional vector is an example of a 2nd-order tensor. A vector is itself a 1st-order tensor. A neuron in an artificial neural network could be a dynamic tensor that transforms M synaptic inputs into N synaptic outputs. My guess is that the TPU allows for massively parallel processing of neurons in an artificial neural network. The simple mapping of M scalar inputs to N scalar outputs does not require a complex and power-consuming ISA, but rather benefits from having very many extremely simple compute units, far simpler than ARM cores and even simpler than general purpose DSPs; simply a (dynamic) linear map implemented in hardware.
easp - Saturday, May 21, 2016 - link
Congratulations, you are so far the only commenter who coudln't be bothered to google for "Google Tensor" to reason out that this is likely targeted at running neural nets.

Googling supports your reasoning. "Tensor Flow" is Google's open-source machine learning library. It was developed for running neural nets, but, they assert, it specializes in "numerical computation using data flow graphs" and is applicable to domains beyond neural networks and machine learning.

Google’s Tensor Processing Unit: What We Know

Post Your Comment

39 Comments

Back to Article

nathanddrews - Friday, May 20, 2016 - link

HollyDOL - Friday, May 20, 2016 - link

nathanddrews - Friday, May 20, 2016 - link

Krysto - Friday, May 20, 2016 - link

JoshHo - Friday, May 20, 2016 - link

HollyDOL - Monday, May 23, 2016 - link

ddriver - Friday, May 20, 2016 - link

name99 - Friday, May 20, 2016 - link

name99 - Friday, May 20, 2016 - link

ddriver - Friday, May 20, 2016 - link

easp - Saturday, May 21, 2016 - link

Michael Bay - Sunday, May 22, 2016 - link

stephenbrooks - Friday, May 20, 2016 - link

jjj - Friday, May 20, 2016 - link

ShieTar - Friday, May 20, 2016 - link

stephenbrooks - Friday, May 20, 2016 - link

xdrol - Saturday, May 21, 2016 - link

ShieTar - Sunday, May 22, 2016 - link

easp - Saturday, May 21, 2016 - link

Jon Tseng - Friday, May 20, 2016 - link

ddriver - Friday, May 20, 2016 - link

sciwizam - Friday, May 20, 2016 - link

Qwertilot - Friday, May 20, 2016 - link

surt - Friday, May 20, 2016 - link

ddriver - Friday, May 20, 2016 - link

ddriver - Friday, May 20, 2016 - link

Murloc - Friday, May 20, 2016 - link

ddriver - Saturday, May 21, 2016 - link

ddriver - Saturday, May 21, 2016 - link

Murloc - Friday, May 20, 2016 - link

ddriver - Saturday, May 21, 2016 - link

WolfpackN64 - Friday, May 20, 2016 - link

JoshHo - Friday, May 20, 2016 - link

Heavensrevenge - Friday, May 20, 2016 - link

name99 - Friday, May 20, 2016 - link

kojenku - Friday, May 20, 2016 - link

extide - Monday, May 23, 2016 - link

Jaybus - Saturday, May 21, 2016 - link

easp - Saturday, May 21, 2016 - link

Log in

Don't have an account? Sign up now