Comments Locked

54 Comments

Back to Article

  • Homeles - Tuesday, June 19, 2012 - link

    Looks like Intel's going to kick Nvidia while they're down... since they're lacking a competitive 28nm GPU when it comes to FP64.

    Still, with Intel being a whole process node ahead, it's disappointing to see that they're only putting out a card roughly equivalent to the 7970. This is only the beginning of a new war, though.
  • Khato - Tuesday, June 19, 2012 - link

    It's not roughly equivalent to the 7970 though. The Xeon Phi manages roughly 1 TFlops in actual performance (rmax) whereas the 7970 is roughly 1 TFlops in theoretical performance (rpeak.) For GPUs, rmax is typically ~60% of rpeak.
  • Spunjji - Tuesday, June 19, 2012 - link

    According to Intel, it does. Given that one can buy and install a 7970 and one cannot yet buy the Laughabee card, I would say that AMD's "theoretical" performance is a lot more practical for the time being. By the time this thing comes out nVidia and AMD will be rolling out second-gen 28nm parts.
  • DigitalFreak - Tuesday, June 19, 2012 - link

    Hush child. The grownups are talking.
  • Guspaz - Tuesday, June 19, 2012 - link

    He does have a point, though. We're talking about the difference between the theoretical performance of a shipping product versus the vendor-reported performance of an unshipped product. It's a pretty silly comparison, really.
  • taltamir - Monday, July 9, 2012 - link

    Such a putdown is not something a grownup does.
    Also everything he said is correct.
  • rickcain2320 - Thursday, July 12, 2012 - link

    Not to sound like a kid (or an out of touch grownup) but what is this Phi thing for?
  • Pirks - Monday, August 13, 2012 - link

    This fetus named DigitalStuck is named so for a reason, 'cause he's still stuck in vagina after so may years. Sometimes he tries to talk but labia majora keeps his hole plugged most of the time, fortunately for all of us.
  • raghu78 - Tuesday, June 19, 2012 - link

    So do you have any benchmarks to show that Intel Xeon Phi achieves 1 TFLOP actual DP performance. Until you have some real benchmarks its best not to comment. Radeon HD 7970 has been reviewed and has proved its compute performance in many benchmarks like LuxMark, SiSoft Sandra . These chips have been praised for their compute performance.

    http://www.anandtech.com/show/5314/xfxs-radeon-hd-...

    http://www.tomshardware.com/reviews/geforce-gtx-68...

    Sisoft Sandra measures DP performance.
  • Kevin G - Tuesday, June 19, 2012 - link

    The Top 500 results score achieves 118 TFlop with 9800 cores. Making the big assumption that all of the performance was from a 50 core MIC card, that'd put performance per card at 602 Gflop double precision. At 64 cores per card, double precision performance would be 770 Gflop. Chances are that part of the result also used the SandyBridge CPU's, otherwise it would have made more sense to go with the quad core Xeons to make the power consumption figures look better. How much this would skew results would depend on the system configuration. Two Xeon E5-2670's per MIC card would have a bigger increase the performance per card rating than one Xeon E5-2670 for four MIC cards would.

    There are a few factors that could raise those scores. As a prototype, clock speeds were likely conservative and there is also the possibility of turbo coming into play. Further more results for a single card and host will likely be higher due to the removal of network overhead.

    Regardless, these results paint Xeon Phi as merely competitive instead of having a decisive performance edge over its GPU counter parts.
  • wumpus - Tuesday, June 19, 2012 - link

    If Intel is beating their chest over "theoretical MFLOPS" I would simply assume that is because they can only claim "machoflops" (haven't heard that term in forever) instead of actually pushing the doubles through.

    One other issue is that if they are gunning for the top500 list, LuxMark and SiSoft don't matter, they need to use LINPAC. There may internal issues between using numbers that aren't based on LINPAC and aren't as high as the competition (using something else).

    Also, pointing out the NVIDIA’s GK110-based Tesla K20 is pretty much a joke considering it runs 20% the fp power of the AMD and Intel systems mentioned (for single point DSP work it should be unstoppable, but don't expect it to be useful for much HPC work).

    Finally, I wonder what it must be like to work at AMD or Nvidia and watch Intel casually launch a swing-for-the-fences product that challenges your bread and butter. They might have a 15 year history of complete fail (on these high end coprocessors), but it looks like the engineers/groups on the project change and you have to worry each time they try.
  • Braincruser - Tuesday, June 19, 2012 - link

    Considering the rumors i have heard so far, Tesla K20 will be focused on double point calculation and carrying around 50-80% of the single floating point performance in double. Around 1.5 TFlops. But these are only rumors and as such i would avoid them till we see it in action.
  • mczak - Tuesday, June 19, 2012 - link

    The chip details about gk110 aren't rumors any more - 15 SMX (with 192SP ALUs and 64DP ALUs) are confirmed. So 1/3 DP rate. The exact flops rate though isn't known since neither clock speed nor the actual active unit count (there's a good chance at least one SMX is always disabled) is known. But it should end up in the neighbourhood of 4 GFlops single / 1.3 GFlops double.
  • dragonsqrrl - Tuesday, June 19, 2012 - link

    ... you mean TFLOPs right?
  • mczak - Wednesday, June 20, 2012 - link

    oops yes. 20 years ago it would have been GFlops :-)
  • Khato - Tuesday, June 19, 2012 - link

    Technically no, because Intel's only numbers are per-node which leaves a question of whether they're allowing the CPUs to contribute or not. If they are including the CPUs, then a single Xeon Phi gets around 700 GFlops in linpack(rmax.)

    And yes, I consider a presentation by Intel using the industry standard benchmark to be a 'real benchmark'. Far more real than what the GPU companies typically throw around in their PR materials.
  • Ryan Smith - Tuesday, June 19, 2012 - link

    Intel has already hit 1TFLOPs on LINPACK, though it's not clear whether this is being shown in a live demo or not.,

    http://www.hpcwire.com/hpcwire/2012-06-18/intel_wi...
  • mczak - Tuesday, June 19, 2012 - link

    Disregarding the issues of "real" vs. "theoretical" flops (which we don't really know enough about, if for instance intel has a 512bit memory interface that could indeed also give an advantage), this is only for DP flops. But I think SP flops shouldn't be completely neglected, and the 7970 (as well as the nvidia K10 though of course this one stinks with DP) very easily beats Knights Corner there.
    There's a lot more than raw DP flops though which counts so it may still be quite ok. It doesn't have any of the graphics "baggage" and the "many-core" approach is certainly a bit different.
  • Haserath - Tuesday, June 19, 2012 - link

    If Intel's main selling point is "easier to code for," they probably don't have much of an advantage otherwise.
  • Assimilator87 - Tuesday, June 19, 2012 - link

    But can it Fold?
  • maximumGPU - Tuesday, June 19, 2012 - link

    how much easier will it be? With all the advancement in gpu programming, and with Microsoft integrating C++ AMP (accelerated massive parallelism) into VS2012, Intel would have trouble selling these if that's their strongest argument.
  • Jaybus - Tuesday, June 19, 2012 - link

    It is algorithm design that is easier, not tool usage or language features.

    It depends on the problem. GPGPU is only good at data-parallel algorithms. If you don't have a lot of data that can be broken into many chunks that can each be processed independently, then it won't work well. Developing for GPUs is an ongoing attempt to eliminate branching, because branching can very easily stall the pipeline. In other words, it is often better to pre-calculate all possibilities in parallel, then choose the correct one in the end. It can quickly get complicated trying to remove if / then logic.

    MIC, though, uses general purpose CPU cores that don't have the same issues with branching, yet has a 16-wide vector unit. While not nearly as wide as a GPU, it is still sort of the best of both worlds. The flexibility makes it easier to program. And, for some problems that are not so data-parallel, it makes it much easier.
  • dragonsqrrl - Tuesday, June 19, 2012 - link

    How is Intel kicking Nvidia while they're down? You're speaking as though Xeon Phi is already available, while the latest road maps indicate that Nvidia's Tesla K20 will be launching first. And I'm not sure if you've realized this, but the theoretical fp64 performance of a fully enabled gk110 should be quite a bit higher than 1 TFLOP, assuming reasonable clocks. gk110's DP performance can operate at 1/3 fp32, and gk104 is already capable of pushing 3 TFLOPs fp32 with 1536 cores. So even assuming the gk110 in K20 will be clocked significantly lower (which is pretty much a certainty), Nvidia should have absolutely no problem exceeding 1TFLOP theoretical fp64 performance. Real world performance is another story though. For that we'll just have to wait for benchmarks.

    As for the HD7970, I'm not even sure how it's relevant. Pro's in the market for a Tesla or Xeon Phi won't even consider an HD7970 as an option. It has neither the industry nor the driver support to be a viable option in this area. However like Ryan said, given AMD's shift in focus with Southern Islands we may very well see a viable option based on GCN before the year is out.
  • HighTech4US - Wednesday, October 31, 2012 - link

    Intel seems to be kicking their own backside if all they can obtain is 1 TF DP from their 22nm process.

    Nvidia's K20 (GK110) is getting 1.3 TF DP on TSMC's 28nm process.

    http://www.hpcwire.com/hpcwire/2012-10-29/titan_se...

    http://www.anandtech.com/show/6421/inside-the-tita...

    We're basing our numbers off of the figures published by HPCWire.

    http://www.hpcwire.com/hpcwire/2012-10-29/titan_se...

    For a given clockspeed of 732MHz and DP performance of 1.3TFLOPs, it has to be 14 SMXes. The math doesn't work for anything else.
  • Casper42 - Tuesday, June 19, 2012 - link

    It really has nothing to do with the common Xeon platform known today.
    Granted the Xeon started way back around the Pentium II era and the MIC uses modified Pentium cores, but I find it a little sad that with their marketing budget they couldn't come up with a better name.

    Atom, Core, Xeon
    Seems like all they needed was a good 4 letter name that vaguely resembles something from a Science Textbook.
  • A5 - Tuesday, June 19, 2012 - link

    Xeon = Workstation. Makes sense to me?
  • Casper42 - Tuesday, June 19, 2012 - link

    Except that Xeon primarily = Server, not workstation.
  • Daeros - Tuesday, June 19, 2012 - link

    Is anyone else wondering why Intel quit with the GPU project? Was it fear of more anti-trust litigation? From the comparisons I have seen, Intel is able to more than compete in the iGPU arena in terms of performance/die area, and I find it hard to believe Intel would experience fabrications woes on the order of what GloFo has gone through. Just wondering...
  • fluxtatic - Tuesday, June 19, 2012 - link

    Legend has a wild GPU driver-writing accident killed Intel's father. To this day, Intel can't bring themselves to write a proper graphics driver. The horror is just too much.
  • Spunjji - Tuesday, June 19, 2012 - link

    +1 :D
  • diamonddog - Tuesday, June 19, 2012 - link

    Finally, someone with a sense of humour! :D
  • IntelUser2000 - Tuesday, June 19, 2012 - link

    The original Larrabee was based on mature 45nm process. The chip was having own problems. It wasn't performant enough, and there were doubts whether going into the resource intensive yet low margin video card market made sense.

    Larrabee did software rendering for everything except the texture unit. Soon after they gave up graphics for Larrabee one thing they touted for Sandy Bridge's iGPU was that some form of fixed function hardware is necessary for good performance.
  • Kevin G - Tuesday, June 19, 2012 - link

    It wasn't raw compute performance that held Larrabee back. That functionality worked form all accounts. What held the designs back was the software stack as a GPU. Intel simply couldn't create a drive that allowed the chip to perform at a competitive level. The chip was reportedly power hungry and used a 6 + 8 pin setup in the few public demos it was seen.

    I do think it was wise to cancel Larrabee when they did as it would have also had the impact of forking the x86 ISA further. This time around, the vector extensions are themselves an extension of AVX instead of an incompatible competitor. While an optimized MIC piece of code may not run on Sandy bridge/Ivy bride, the AVX code written with those chip in mind will now work on the MIC card.
  • A5 - Tuesday, June 19, 2012 - link

    Intel quit making a GPU because it wasn't going to be competitive in the market.
  • HighTech4US - Tuesday, June 19, 2012 - link

    An Inconvenient Truth: Intel Larrabee story revealed

    http://www.brightsideofnews.com/news/2009/10/12/an...
  • iwod - Tuesday, June 19, 2012 - link

    Intel's advance with Manufacturing and x86 Is keeping them in near Monopoly state. And it wont be long before Nvidia is kicked out of HPC Market.
  • duploxxx - Tuesday, June 19, 2012 - link

    IT won't abandon Quadro or tesla that fast. THey are really settled in for many years, even ATI with there Firepro brand which often provides a better price/performance ratio isn't able to gain much marketshare. Typical behaviour of human they go for a brand name.
  • Spunjji - Tuesday, June 19, 2012 - link

    As an IT reseller I can confirm that. I recommend AMD FirePro to price-conscious users and almost invariably they prefer to buy lower-grade Quadro cards, even if they don't intend to use it for 3D rendering. The mind boggles.
  • MrSpadge - Tuesday, June 19, 2012 - link

    Part of the reason is software. CUDA is orders of magnitude better utilized than anything AMD offers. Not sure about the Quadros, but nVidias si software support is also strong here.
  • TC2 - Tuesday, June 19, 2012 - link

    correct!!!
    but don't miss that KK is uA targeting only HPC!!!
  • Kevin G - Tuesday, June 19, 2012 - link

    Nothing Intel has shown off yet would raise the ire of anti-trust regulators. Just give it a generation or two when PCI-e bandwidth becomes a compute bottleneck and latencies too high. The natural resolution would be to move the MIC chip from a PCI-e card to the motherboard which would connect to Intel Xeon chips over QPI. That is far more restrictive in terms of platform though ti may not enough for regulators to actually step in as it is the natural path for this technology to take.
  • Jaybus - Tuesday, June 19, 2012 - link

    No. The natural plan is to move the MIC onto the same die with a few modern Xeon cores. At that point, no PICe card could compete, because the bandwidth would be too high. Also, it removes the memory-to-memory copies altogether, since the MIC cores and Xeon cores would have access to the same RAM. All cores would participate in the same shared memory ring bus, which is a 1024-bit (512-bit bidirectional) bus on the current MIC.
  • HighTech4US - Wednesday, October 31, 2012 - link

    Nvidia K20 (GK110) 1.3 TFlops DP built on 28nm process available NOW.

    Intel Phi built on 22nm process may have 1 TFlops DP and is nowhere to be seen.

    http://www.hpcwire.com/hpcwire/2012-10-29/titan_se...
  • arthur449 - Tuesday, June 19, 2012 - link

    Did they seriously use "synergistically" in that graphic?

    Hold on a moment. I need to go club a few marketing majors.
  • dgingeri - Tuesday, June 19, 2012 - link

    Anyone else see where this is going?

    In five years, we'll have an Intel quad to octo core chip with integrated video and about 20 to 40 of these type "assistant" cores on one piece of silicon. I have no doubt.

    all the while, AMD already has it with Trinity. Intel will claim it is a first, but it won't really be. AMD just won't be able to fully capitalize on it, leaving them in a secondary market.
  • Broheim - Tuesday, June 19, 2012 - link

    since when can trinity run native x86 code?
  • Broheim - Tuesday, June 19, 2012 - link

    just to clarify, I was referring to the the GPU part of trinity.
  • Meghan54 - Tuesday, June 19, 2012 - link

    On page 1, you wrote, "Intel is still holding their cards close to their chest at this time..."

    At least get the cliche correct. It should be "...holding their cards close to their vest.....", not chest.

    I know it's an old cliche and vest sounds a lot like chest, but if you're going to use a cliche, use the correct one.

    Just sayin'.....
  • slk123 - Tuesday, June 19, 2012 - link

    It can be either chest or vest. Personally I have never heard anyone say vest.
  • geddarkstorm - Tuesday, June 19, 2012 - link

    Holding something "close to the chest" has been the phrase for the past few decades, about keeping details secret or private. Another variant is "playing my cards close to my chest", which of course comes from poker.

    I have never heard nor seen "close to the vest" used. No idea where you're getting this.
  • jecastejon - Tuesday, June 19, 2012 - link

    I'm very interested in any solution to speed up 3D renderings as many other CG artist, being the obvious ones, an 18 core dual Xeon machine with a top Quadro, a second dedicated Xeon machine or even a small render farm. I don't see Tesla listed in Nvidia's own page with high end 3D rendering software and I don't know why, but I guess it may be because it is a GPU architecture not fully supported for more complex software rendering engines like, say Mental Ray or Maya Software Render.

    Could this Xeon-Phi be more transparent to current high end software-rendering? Will I may be able to install one or more of this PCIe cards into the same Xeon machine to speed up specifically a software rendering engine?

    To me the advantage of this co-processor is that I will have more freedom to choose a very fast 6 core Xeon for multitask in 3D specific environment, move the co-processors to newer Xeon architectures (I am not sure) and not have to start all over with every new generation. I might end up with a more compact full tower Xeon system. Or maybe with a second dedicated more independent co-processor machine. Sorry if I am asking for to much...
  • silverblue - Tuesday, June 19, 2012 - link

    ...how much is this going to cost? How much power is it going to use? Are the drivers going to be any good? I know these aren't answers anybody can provide right now, but I don't expect that something sporting 50+ "simple" x86 cores to be cheap or frugal. Still, you never know.

    I'm not sure why you mentioned the 7970 though, being that it's a desktop card. The FirePro W9000 is really what we need to compare it to - 6GB GDDR5, 1GHz Tahiti XT card rated at, oddly, 1TFLOPs for double precision. The K20 is rated at an astounding 1.7TFLOPs so Intel and AMD had better have something ready to tackle that one.
  • drawer77 - Wednesday, June 20, 2012 - link

    Many retail processors cost no less than 300 dollars. In my opinion, this card is going to cost more than 10000 dollars. There is definitely a market for low-end version of this card
  • Wolfpup - Monday, July 23, 2012 - link

    And if so, how much does it cost? :-D

Log in

Don't have an account? Sign up now