Perhaps it's been down-adjusted to only reflect the transistors (in their automatedly designed topology) that actually do anything - that would probably be the most consistent explanation.
If AMD cant do something as simple and tell reviewers the correct transistor count of there CPU then it's no wonder they cant make a CPU that actually performs good. So now, not only is the CPU is an epic fail, but so are the PR department.
Relax. Whether it had 2B transistors or it had 2, a press release is nothing to freak out about. And by the way, AMD CPUs are not bad. They are highly reliable and quite fast. What people so often forget is how hard it is to build something this complex and have it function at 99.99999999% reliability. 3.8M transistors per mm^2? People just fail to realize exactly how good we have it, and just how aggressive Intel is with their products now (thanks to AMDs existence).
You can get a quad core CPU for $50 today, yet that ain't good enough.
AMD calls it a quadcore, and calls the price $50, but it's only a dual core and the cost is substantially more, but you know how hard it is to get these cpu things correct.... especially for amd fans.
I wonder how much money the new design method is saving them, as it certainly isn't as good at producing high performance per watt as the old method of manually designing each transistor. I also wonder if they will go back to the old method of hand tweaking with Bulldozer's successor, maybe they just wanted to get something out the door for now then refine it later, like Phenom I - Phenom II.
I've been insisting on personally inspecting the Lithography of any CPU I buy in a store, just like I do any other thing like MB and graphics card. I figure if I push for it long enough, I might be able to get them to bring in an electron microscope.
Maybe they just wanted to cover up the somewhat higher power consumption at launch with the fact that, of course it has to consume a bit more power, look, it's got nearly twice the transistor count of a Gulftown chip.
Doubt it, saying we doubled the transistor count while barely edging out our own old part most of the time is hardly good for PR. This was either a confusion between the desktop and server/16 core chip, or a difference in how they count transistors because of the new automatic design method which produces far more useless transistors than manual creation would.
Yes I only had to listen to the amd fan boys for over three months go on and on about how amd engineers could pack so many transistors in so little die speace, the efficiency being very excellent and "proved once again" they have the "best hardware". Just imagine how the PR mancheans giggled reading the fanboys blubbering on.
Then, since the performance was so suck, the amd fans went into a tirade on windows inefficiency and tin foil hat Intel and Microsoft conspiracy theories, then we saw the endless "scheduling issue" lie rear it's ugly head, and the windows patch, that did crap squiggly for it.
The PR liars did a wonderful job snowballing the fan base. They deserve a raise for the boldness and effectiveness of the massive duping they delivered all the EXPERTS...
For example Athlon II X2 CPU is 234 Millions transistors, and Redwood GPU is 627 Millions. 234x2 + 627 = 1.095 Billions and in this number we get double IMC etc...
Okay, what we know is that cache (and DRAM) are extremely transistor-dense, GPU compute area is fairly dense, and CPU compute area is much less dense (because it doesn't make regular patterns). Crossbar switches and other routing stuff is perhaps the least dense of all - it's all wires.
As a rough estimate, caches require 64 transistors per byte, hence 64 million transistors per megabyte - so Deneb's 8MB total makes 512 million transistors just in the cache, Bulldozer doubles that to 16MB and 1024 million transistors for cache.
Subtracting the appropriate cache sizes from the original Deneb and Bulldozer figures left Bulldozer with twice the transistor count per core - not per module, per *core* - than Deneb. With no performance improvement per clock per core to show for it, I thought that was a really strange result.
Subtracting 800 million transistors from Bulldozer makes that comparison much more interesting. Deneb gets 246M over four cores, giving 61.5M transistors per core. Bulldozer gets only about 200M transistors over four *modules*, making on average 50M transistors per module, 25M transistors per core.
So somehow, Bulldozer's modules are actually more efficient in transistor count than Deneb's, despite the longer pipeline and contaiing two threads! A slight reduction in IPC per core is therefore entirely justified.
40% reduction in transistor count equals makes perfect sense, because it's about 40% slower than I thought it should have been.
I remembered looking at the charts at the beginning, and wondering how the hell it was slower, clock for clock, than Thuban, with more than twice the amount of transistors.
I'm somewhat relieved at this news as well. It doesn't change bulldozer's performance, but it sure makes it look better for future variants to increase performance and power efficiency. If AMD can't beat Intel with 2x the transistor count, they would be in huge trouble. Luckily, with 1.33x the transistor count, they can trounce Intel in many multithreaded workloads. This makes a lot more sense, as it's what the architecture was designed to do. Bulldozer was meant to add more 'cores' with fewer transistors, and it appears with the real transistor count they have achieved this.
AMD should has corrected transistors of ONE module which w/ 2MB L2 has 213M tr. because if we'd do calculation 213M(tr./one module)*4= 852M transistors. 1200M - 852M= 348M. Is it possible that 348M transistors could serve 8MB L3 plus uncore parts?
Do we have any reason to trust the AMD PR department right now? Because what it sounds like to me is that 1.2B may be the functional design transistor count, and 2B may be the actual floorplan transistor count with a nice huge 800M discrepancy because of their lackluster physical design. I mean, those huge 'dead' spaces between the actual logic blocks in the die shot (you can distinctly see each module, L3 cache, and uncore) are almost certainly automated signal routing, and with those kinds of distances it's guaranteed that you're going to have a lot of repeaters...
I can easily see AMD PR deciding that it looks bad to be using so many transistors to get such pathetic performance... so why not claim the other transistor number? There's no real way to confirm or deny their number. And they can justify it with the pathetic excuse that they're only using 1.2B transistors on the actual design, even if actual silicon has far, far more.
Indeed. Though the linked post is slightly off as its calculation equates 16M to 16e6 instead of 16*2^20. Using the correct equation results in 1207959552 transistors...
So apparently AMD's PR department can count L2+L3 cache transistor numbers?
That's correct but for the proof of the concept I thought I use the simplified decimal metrics instead of the binary equation. Otherwise, who knows what kind of algebraic abuses I would see in the replies.
And now to reply to myself as I wake up a bit more...
The number is more likely around 900M for the L2+L3 cache as they appear to be using 6T sram as usual there. Still, the point stands that the 2B transistor number seems about right.
Correct me if I'm wrong, but I thought that transistor count was never an exact figure. Designers usually get a NAND equivalent gate count, from their design tools, as if all gates were the same type so this is a first approximation. Then transistor count is derived by simply multiplying by a factor of 4, the number of transistors in one Nand gate. Can it explain the discrepancy?
Yes, it could, at least to a certain degree. Specifically, if AMD applies the "logic transistor" footprint to the overall used die size, i.e. the parts that really have some functional transistors as opposed to some areas that are interspersed as fillers - physical connections (which there are plenty of on BD). This might give you one number for the transistor count.
Now, as mentioned in various locations, cache / SRAM transistor footprint is substantially smaller than logic or, to put it differently, the packing is much denser. So if you applied logic transistor density to cache area, you get a pretty wrong number - it will depend, among other things also on the specific SRAM cell and so on.
So yes, in general you are correct and I have a hunch that the 2 B vs. 1.2 B would come out quite well if you just do that type of "Stupid math" without differentiating between die areas. In that case, both numbers would be wrong and the real transistor count would probably be more in the area of 1.75-1.8 B transistors total.
I have left messages with AMD but received not a single word of feedback.
Wednesday 16 November 2011 16:01 - Author: Dr @ AMD Bulldozer Schrumpfkur missed - a virtual
Who now expected, it would be a new revision of the miracle bulldozer in the wings who will be disappointed. For the launch of the first "Bulldozer" processor-based desktop AMD FX (codenamed "Zambezi") had AMD communicated to the press through a die size of 315 mm ² and a transistor count of around 2 billion. As The Register then in the context of the launch of the new Opteron 6200 ("Interlagos") and 4200 ("Valencia") is only 2.4 billion transistors for the "Interlagos" spoke, asked some users cope in our forum, such as this number to the previous statement fits. After all, this figure is 40% below the original value of about 4 billion
Therefore we have asked AMD, after which it has been confirmed that the statement of The Register is accurate. Because all processors of the first "Bulldozer" generation on the same "Orochi" The building is for all versions 315 mm ² die size and 1.2 billion transistors to estimate. Since the Opteron 6200 is constructed from an MCM (Multi Chip Module) from each of two dies, each processor must here per the duplicate values are estimated. The virtual downsizing in the number of transistors is built so simply due to a communication error. As it happened, you could not tell us yet.
so i also gotcha a thought in ma mind that aftr 2b transistor count why the tests are telling that bd is weaker than i7-2700k. Amd is just faking for the sales of bd they must have to apologize people for their fatal eror!sick!
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
43 Comments
Back to Article
philosofa - Friday, December 2, 2011 - link
Perhaps it's been down-adjusted to only reflect the transistors (in their automatedly designed topology) that actually do anything - that would probably be the most consistent explanation.SunLord - Friday, December 2, 2011 - link
More then likely the PR people were confused with the 2Billion transistors in the 16 core server parts and have sense been fired for it.MonkeyPaw - Friday, December 2, 2011 - link
Not sure about being fired, but my guess is they just provided some bad info on press release.Maybe the 2B was actually for Trinity with its IGP included?
B3an - Friday, December 2, 2011 - link
If AMD cant do something as simple and tell reviewers the correct transistor count of there CPU then it's no wonder they cant make a CPU that actually performs good. So now, not only is the CPU is an epic fail, but so are the PR department.MonkeyPaw - Friday, December 2, 2011 - link
Relax. Whether it had 2B transistors or it had 2, a press release is nothing to freak out about. And by the way, AMD CPUs are not bad. They are highly reliable and quite fast. What people so often forget is how hard it is to build something this complex and have it function at 99.99999999% reliability. 3.8M transistors per mm^2? People just fail to realize exactly how good we have it, and just how aggressive Intel is with their products now (thanks to AMDs existence).You can get a quad core CPU for $50 today, yet that ain't good enough.
Arnulf - Saturday, December 3, 2011 - link
"You can get a quad core CPU for $50 today, yet that ain't good enough."You can ?
CeriseCogburn - Monday, June 11, 2012 - link
AMD calls it a quadcore, and calls the price $50, but it's only a dual core and the cost is substantially more, but you know how hard it is to get these cpu things correct.... especially for amd fans.SunLord - Friday, December 2, 2011 - link
AMD let go most of it's marketing people and it's PR company around the launch of the FX seriesCeriseCogburn - Monday, June 11, 2012 - link
So the fair and just honest people left, and the gouge their neighbors eyes out dishonest stomp on anyone lying people stayed.Great.
tipoo - Friday, December 2, 2011 - link
I wonder how much money the new design method is saving them, as it certainly isn't as good at producing high performance per watt as the old method of manually designing each transistor. I also wonder if they will go back to the old method of hand tweaking with Bulldozer's successor, maybe they just wanted to get something out the door for now then refine it later, like Phenom I - Phenom II.jensend - Friday, December 2, 2011 - link
You wrote ~2B for bulldozer in the above table.When a media source publishes a correction that doesn't correct anything, do you call it an incorrection?
Anand Lal Shimpi - Friday, December 2, 2011 - link
ha! fixed :)jjj - Friday, December 2, 2011 - link
Next time you MUST count them yourself !DaFox - Friday, December 2, 2011 - link
I'd enjoy that.Paul Tarnowski - Friday, December 2, 2011 - link
I've been insisting on personally inspecting the Lithography of any CPU I buy in a store, just like I do any other thing like MB and graphics card. I figure if I push for it long enough, I might be able to get them to bring in an electron microscope.What? It could happen!
gerryka - Friday, December 2, 2011 - link
Maybe they just wanted to cover up the somewhat higher power consumption at launch with the fact that, of course it has to consume a bit more power, look, it's got nearly twice the transistor count of a Gulftown chip.tipoo - Friday, December 2, 2011 - link
Doubt it, saying we doubled the transistor count while barely edging out our own old part most of the time is hardly good for PR. This was either a confusion between the desktop and server/16 core chip, or a difference in how they count transistors because of the new automatic design method which produces far more useless transistors than manual creation would.CeriseCogburn - Monday, June 11, 2012 - link
No it "proved" their octocore was a "real" octocore and not a crippled quad with amd fake HT...Plus this very site went on about the efficiency, as did all the little amd power fan boys. I saw it myself.
So the liars at amd pr knew exactly what they were doing, and played the amd fan boys like a stradivarius.
CeriseCogburn - Monday, June 11, 2012 - link
Yes I only had to listen to the amd fan boys for over three months go on and on about how amd engineers could pack so many transistors in so little die speace, the efficiency being very excellent and "proved once again" they have the "best hardware".Just imagine how the PR mancheans giggled reading the fanboys blubbering on.
Then, since the performance was so suck, the amd fans went into a tirade on windows inefficiency and tin foil hat Intel and Microsoft conspiracy theories, then we saw the endless "scheduling issue" lie rear it's ugly head, and the windows patch, that did crap squiggly for it.
The PR liars did a wonderful job snowballing the fan base. They deserve a raise for the boldness and effectiveness of the massive duping they delivered all the EXPERTS...
Marc HFR - Friday, December 2, 2011 - link
At the International Solid-State Circuits Conference 2011, AMD told us that Module (Logic + L2) = 213 millions transistors8 MB L3 cache use at least 6 transistor / bit : 402 millions
It's 1.254 billions WITHOUT Hypertransport / DDR3 IO and Northbridge...
Conficio - Friday, December 2, 2011 - link
Unfortunately the transistor count i snot mentioned in the table for Liano.However, Liano a 4C part of the old Core types has in the density graph double the density of Bulldozer?
I'd think if AMD is capable of such a dense design and it is advantageous, they'd use it for their flag ship processor.
In other words, can you add the Liano numbers to the first table and verify that the density is correct?
Thanks!
Marc HFR - Friday, December 2, 2011 - link
1.45B for 228mm2But the 1.45B seems way too high
For example Athlon II X2 CPU is 234 Millions transistors, and Redwood GPU is 627 Millions. 234x2 + 627 = 1.095 Billions and in this number we get double IMC etc...
tipoo - Friday, December 2, 2011 - link
Probably due to the on-die GPU portion of Llano, since GPU's have so much redundant hardware its easier to make them nice and dense.Evleos - Friday, December 2, 2011 - link
How could anyone believe that it was 2.4 billion?http://en.wikipedia.org/wiki/List_of_future_AMD_mi...
The_Countess - Friday, December 2, 2011 - link
2 x 1.2 = 2.4 billion for the dual-die server parts?I can easily see how that could lead to confusion.
chromatix - Friday, December 2, 2011 - link
Okay, what we know is that cache (and DRAM) are extremely transistor-dense, GPU compute area is fairly dense, and CPU compute area is much less dense (because it doesn't make regular patterns). Crossbar switches and other routing stuff is perhaps the least dense of all - it's all wires.As a rough estimate, caches require 64 transistors per byte, hence 64 million transistors per megabyte - so Deneb's 8MB total makes 512 million transistors just in the cache, Bulldozer doubles that to 16MB and 1024 million transistors for cache.
Subtracting the appropriate cache sizes from the original Deneb and Bulldozer figures left Bulldozer with twice the transistor count per core - not per module, per *core* - than Deneb. With no performance improvement per clock per core to show for it, I thought that was a really strange result.
Subtracting 800 million transistors from Bulldozer makes that comparison much more interesting. Deneb gets 246M over four cores, giving 61.5M transistors per core. Bulldozer gets only about 200M transistors over four *modules*, making on average 50M transistors per module, 25M transistors per core.
So somehow, Bulldozer's modules are actually more efficient in transistor count than Deneb's, despite the longer pipeline and contaiing two threads! A slight reduction in IPC per core is therefore entirely justified.
Marc HFR - Friday, December 2, 2011 - link
Bulldozer module (including L2 cache) is 213 millions transistors according to AMD at the 2011 International Solid-State Circuits Conference.85 millions excluding L2 cache according to your data (64 millions transistors per L2 Megabyte). It's much more than 50M ...
twhittet - Friday, December 2, 2011 - link
40% reduction in transistor count equals makes perfect sense, because it's about 40% slower than I thought it should have been.I remembered looking at the charts at the beginning, and wondering how the hell it was slower, clock for clock, than Thuban, with more than twice the amount of transistors.
dew111 - Friday, December 2, 2011 - link
I'm somewhat relieved at this news as well. It doesn't change bulldozer's performance, but it sure makes it look better for future variants to increase performance and power efficiency. If AMD can't beat Intel with 2x the transistor count, they would be in huge trouble. Luckily, with 1.33x the transistor count, they can trounce Intel in many multithreaded workloads. This makes a lot more sense, as it's what the architecture was designed to do. Bulldozer was meant to add more 'cores' with fewer transistors, and it appears with the real transistor count they have achieved this.Aone - Friday, December 2, 2011 - link
AMD should has corrected transistors of ONE module which w/ 2MB L2 has 213M tr. because if we'd do calculation 213M(tr./one module)*4= 852M transistors. 1200M - 852M= 348M.Is it possible that 348M transistors could serve 8MB L3 plus uncore parts?
Khato - Friday, December 2, 2011 - link
Do we have any reason to trust the AMD PR department right now? Because what it sounds like to me is that 1.2B may be the functional design transistor count, and 2B may be the actual floorplan transistor count with a nice huge 800M discrepancy because of their lackluster physical design. I mean, those huge 'dead' spaces between the actual logic blocks in the die shot (you can distinctly see each module, L3 cache, and uncore) are almost certainly automated signal routing, and with those kinds of distances it's guaranteed that you're going to have a lot of repeaters...I can easily see AMD PR deciding that it looks bad to be using so many transistors to get such pathetic performance... so why not claim the other transistor number? There's no real way to confirm or deny their number. And they can justify it with the pathetic excuse that they're only using 1.2B transistors on the actual design, even if actual silicon has far, far more.
Phylyp - Friday, December 2, 2011 - link
I now have a question - how do they come up with these numbers? Is it an estimate? Is it a true count at the tape-out point?If its an estimate, I can understand how a changed/improved estimation technique can revise the numbers... though a 40% variation is extremely shoddy.
MS - Friday, December 2, 2011 - link
At 16 MB total L2+L3 cache, the transistor count for those two caches alone comes out as 1.2 B transistorshttp://www.lostcircuits.com/forum/viewtopic.php?f=...
Khato - Friday, December 2, 2011 - link
Indeed. Though the linked post is slightly off as its calculation equates 16M to 16e6 instead of 16*2^20. Using the correct equation results in 1207959552 transistors...So apparently AMD's PR department can count L2+L3 cache transistor numbers?
MS - Friday, December 2, 2011 - link
That's correct but for the proof of the concept I thought I use the simplified decimal metrics instead of the binary equation. Otherwise, who knows what kind of algebraic abuses I would see in the replies.Khato - Friday, December 2, 2011 - link
Haha, fair enough. Either way, the end result is quite amusing.Khato - Friday, December 2, 2011 - link
And now to reply to myself as I wake up a bit more...The number is more likely around 900M for the L2+L3 cache as they appear to be using 6T sram as usual there. Still, the point stands that the 2B transistor number seems about right.
MS - Friday, December 2, 2011 - link
Yes, I updated my post accordingly as well.Chaki Shante - Friday, December 2, 2011 - link
Correct me if I'm wrong, but I thought that transistor count was never an exact figure. Designers usually get a NAND equivalent gate count, from their design tools, as if all gates were the same type so this is a first approximation. Then transistor count is derived by simply multiplying by a factor of 4, the number of transistors in one Nand gate. Can it explain the discrepancy?MS - Friday, December 2, 2011 - link
Yes, it could, at least to a certain degree. Specifically, if AMD applies the "logic transistor" footprint to the overall used die size, i.e. the parts that really have some functional transistors as opposed to some areas that are interspersed as fillers - physical connections (which there are plenty of on BD). This might give you one number for the transistor count.Now, as mentioned in various locations, cache / SRAM transistor footprint is substantially smaller than logic or, to put it differently, the packing is much denser. So if you applied logic transistor density to cache area, you get a pretty wrong number - it will depend, among other things also on the specific SRAM cell and so on.
So yes, in general you are correct and I have a hunch that the 2 B vs. 1.2 B would come out quite well if you just do that type of "Stupid math" without differentiating between die areas. In that case, both numbers would be wrong and the real transistor count would probably be more in the area of 1.75-1.8 B transistors total.
I have left messages with AMD but received not a single word of feedback.
FeuchterFutzi - Saturday, December 3, 2011 - link
http://www.planet3dnow.de/cgi-bin/ne...?id=1321455...in googlish:
Wednesday 16 November 2011
16:01 - Author: Dr @
AMD Bulldozer Schrumpfkur missed - a virtual
Who now expected, it would be a new revision of the miracle bulldozer in the wings who will be disappointed. For the launch of the first "Bulldozer" processor-based desktop AMD FX (codenamed "Zambezi") had AMD communicated to the press through a die size of 315 mm ² and a transistor count of around 2 billion. As The Register then in the context of the launch of the new Opteron 6200 ("Interlagos") and 4200 ("Valencia") is only 2.4 billion transistors for the "Interlagos" spoke, asked some users cope in our forum, such as this number to the previous statement fits. After all, this figure is 40% below the original value of about 4 billion
Therefore we have asked AMD, after which it has been confirmed that the statement of The Register is accurate. Because all processors of the first "Bulldozer" generation on the same "Orochi" The building is for all versions 315 mm ² die size and 1.2 billion transistors to estimate. Since the Opteron 6200 is constructed from an MCM (Multi Chip Module) from each of two dies, each processor must here per the duplicate values are estimated. The virtual downsizing in the number of transistors is built so simply due to a communication error. As it happened, you could not tell us yet.
Source: AMD
omi - Saturday, December 10, 2011 - link
so i also gotcha a thought in ma mind that aftr 2b transistor count why the tests are telling that bd is weaker than i7-2700k. Amd is just faking for the sales of bd they must have to apologize people for their fatal eror!sick!C'DaleRider - Monday, February 2, 2015 - link
English. Do you speak it?