Comments Locked

49 Comments

Back to Article

  • OEMG - Friday, October 14, 2016 - link

    I guess Intel's fine with good ol' PCIe. It's actually crazy how many players are coming up with their own interconnects.
  • patrickjp93 - Friday, October 14, 2016 - link

    Omnipath and Omniscale.
  • johnpombrio - Friday, October 14, 2016 - link

    Funny thing is that IBM is also in bed with NVidia using the NVLink and the P100 GPUs with a "5X faster than PCI" bus. So which bus is IBM pushing here? Intel completely owns the server market so exactly who is this for? The China server market?
  • SarahKerrigan - Saturday, October 15, 2016 - link

    Not quite. Non-x86 server sales account for a little under 15% of server revenue total, and IBM makes up most of that. That's several billion dollars per year worth of hardware.
  • iwod - Saturday, October 15, 2016 - link

    This is a lot higher then I thought. Where is that data from? Considering x86 has 95%+ of server unit shipped, which means the 5% shipment of non-x86 represent 15% revenue.
  • SarahKerrigan - Saturday, October 15, 2016 - link

    x86 is actually over 99% of units shipped. Non-x86 systems tend to be mainframes and commercial UNIX systems with a very high per-unit price (a single mainframe can run into the millions of dollars.) So, a much higher percentage of revenue than volume.

    IDC and Gartner both have data supporting this.
  • lefty2 - Friday, October 14, 2016 - link

    So.. basically, it's a faster bus than PCIe. Why put all that bullshit at the start of the article? This is nothing to do with machine learning or "heterogeneous computing".
  • diehardmacfan - Friday, October 14, 2016 - link

    The need for faster interconnects absolutely has to do with heterogeneous computing. This standard isn't for your average gaming GPU.
  • lefty2 - Friday, October 14, 2016 - link

    No, it's not. It can be used in any application that needs higher bus speed. Also, it could be used for high-end gaming GPU's (why not?)
  • Yojimbo - Friday, October 14, 2016 - link

    High-end gaming GPUs don't need higher bus speeds. Besides this is targeted towards data center servers. It is made with heterogeneous computing in mind. Even without considering the technical decisions made and how they affect various use cases, this consortium doesn't include Intel so it's going to have to exist in a non-Intel ecosystem. You're not likely to see CAPI in consumer hardware any time soon.
  • emn13 - Friday, October 14, 2016 - link

    They might benefit from lower latency, however, to be able to shuffle even smaller tasks to the gpu (or conversely, to let traditionally gpu-only tasks benefit from short bursts of branchy logic on the cpu).

    Multi GPU might also be easier.
  • [email protected] - Friday, October 14, 2016 - link

    I don't see why not? It would certainly be used by AMD consumer hardware. That includes laptops, desktops, and gaming consoles.
  • fallaha56 - Sunday, October 16, 2016 - link

    er no seems highly relevant for games with advanced AI, physics and async compute models for games now incoming -plus multicore finally being a reality

    you just made a Bill Gates '640kb should be enough for anybody' comment
  • JohanAnandtech - Friday, October 14, 2016 - link

    I would appreciate it that you ask your questions a bit more civil. You call it "bullshit" while big data and machine learning are one of the most important battlegrounds that will decide who will get marketshare in the server market.
  • lefty2 - Friday, October 14, 2016 - link

    I appologise.
    The article indicates that it's a faster replacement to PCIe and NVlink, but now I read comments from SarahKerrigan that does not seem to be true.
  • Michael Bay - Saturday, October 15, 2016 - link

    >muh big data
    >muh machine learning

    Ah, the newfangled ways for IT managers to justify ridiculous spending for no useful outcome.
    Nobody other than already present will "get marketshare in the server market" anyway, which is to say intel will keep dominating it. Notice their abscence from the list.
  • JohanAnandtech - Saturday, October 15, 2016 - link

    Your comment sounds a lot like the one I heard from older ITers back in the mid nineties. Who needs Internet, we got very solid high end machines and internal networks here that take care of our IT. Every decent bank is already using big data technology & machine learning to know the customer they have in front of them.
  • Michael Bay - Saturday, October 15, 2016 - link

    ...and here`s the typical IT manager nonanswer.
    Thank you!
  • fallaha56 - Sunday, October 16, 2016 - link

    @johan absolutely ;)

    '640k should be enough for anybody'
  • Kevin G - Friday, October 14, 2016 - link

    It isn't about just bandwidth but coherency and latency. That is what can enable seamless heterogeneous compute. OpenCAPI provides both.
  • wyewye - Saturday, October 15, 2016 - link

    I second that. We really dont need your multiple pages of idiotic "philosophy" Johan. This entire article is only "25Gbits per second per lane". The rest is complete garbage.
  • tuxRoller - Sunday, October 16, 2016 - link

    It's actually 16-112GT/s/lane, and you can aggregate up to 256 lanes. That's a lot faster.
    It's also not even close to the whole story.
  • tuxRoller - Sunday, October 16, 2016 - link

    Ugh. Ignore. Wrong article.
  • Meteor2 - Wednesday, December 21, 2016 - link

    Would you talk to your mother like that? Be more civil.

    Johan is completely correct with his analysis and you'd do well to listen if you don't want to be behind the times.
  • fanofanand - Friday, October 14, 2016 - link

    Forgive my ignorance, but isn't Pci-E 4.0 supposed to double the bandwidth to 32Gbits/s? If that's the case, what good is this new interconnect? Can it supply more power than a traditional Pci-E lane? I just don't understand the "why" behind this.
  • diehardmacfan - Friday, October 14, 2016 - link

    PCI-E 4.0 is 32Gbit/s with 16 lanes, this is 25GB/sec with 1 lane.
  • LoneRat - Friday, October 14, 2016 - link

    Its 25Gbit/s, not GByte/s.
    BTW this interface is almost obsolete since it doesn't offer anything superior than PCI-E 3.0. It is more like a "half-node" between PCI-E 3.0 16Gbps and 4.0 32Gbps. Intel may just need to push 4.0 out earlier to battle it. Since PCI-E 4.0 is backward compatible with 3.0 while OCAPI is a completely new interface (and I doubt it will be compatible with PCIE standard), OCAPI won't be as popular as PCIE.
    By the way, OCAPI might become a competing standard to PCIE, but implementing 2 different standards on to 1 die of a SoC will be ridiculous. Unless IBM can find a way to emulate/support PCIE on OCAPI or unless OCAPI is a more efficient route, the new interface will be dead like IBM's other non-standard, non-compatible interfaces in the old days.
  • close - Friday, October 14, 2016 - link

    This isn't just another interface for your gaming card.

    PCIe 3.0 offers 8Gbps and PCIe 4.0 doubles that to 16Gbps over one lane versus the 25Gbps for Open CAPI. Add the full coherency to that and you get something that addresses exactly the issues the industry has right now with the available interconnects.

    You're right, it won't be very popular among the guys looking to get a GeForce GTX2080 for their PC...
  • fatpenguin - Friday, October 14, 2016 - link

    This isn't accurate - PCIe 3.0 offers 8GT/sec per lane (Gen 2 is effectively 4GT/sec after the 8b/10b). PCIe 4 is supposed to offer 16GT/sec per lane. With 16 lanes, it's 32GB/sec.

    Also, this is 25Gbit/sec, not 25GB/sec. 16 lanes would give 50GB/sec of bandwidth.

    I don't understand where the article's claims of 16Gbit/sec per lane for PCIe 3.0 come from. That's where PCIe 4.0 comes in.

    Either way, this is considerably faster per lane, and likely offers less overhead / latency than PCIe, which is where the majority of performance improvements lie...PCIe has quite a bit of overhead for small transactions.

    What about mechanical changes? I'd be a bit surprised if they are trying to achieve this rate with the same constraints (edge card connector, etc).
  • rhysiam - Friday, October 14, 2016 - link

    That data rate for PCIe 3.0 made me pause too. It's bidirectional, so my guess is that Johan was quoting it as (almost) 8Gb per second, per lane, per direction, so 16Gb total bandwidth (counting send and receive).
  • SarahKerrigan - Friday, October 14, 2016 - link

    The major difference between CAPI and PCIe is that CAPI is fully coherent and requires no additional address translation between the CPU and the attached device. This significantly reduces latency. Today's CAPI actually just uses the PCIe physical layer, but runs its own protocol on top of it, so it's not just about bandwidth. Some cards, as far as I know, can autodetect CAPI and switch into CAPI mode when it's available and run into PCIe when it's not; afaik Mellanox's CAPI-compatible HBA's work like this.

    That being said, I expect OpenCAPI (which IBM called New CAPI in the P9 presentation) to be a net bandwidth win too, via having more lanes per slot - since it appears, per the Power9 presentation, to be using the same physical layer as NVlink 2.0.
  • fanofanand - Friday, October 14, 2016 - link

    Thank you macfan and Sarah, I appreciate the insight! With the ability to couple lanes together, I can see where this would be an order of magnitude faster, and if they get a latency reduction then yeah this is a big deal.
  • p1esk - Friday, October 14, 2016 - link

    What does it mean "CAPI is fully coherent"?
  • fallaha56 - Sunday, October 16, 2016 - link

    coherency
  • Yojimbo - Friday, October 14, 2016 - link

    This article fails to mention the fact that NVIDIA is a member of the OpenCAPI consortium, at the "contributor level", which is the same level Xilinx has, and seems to be a level below "board level", which is the level of the founders of the consortium: AMD, Google, IBM, Mellanox, and Micron. HPE is also a contributor level member, and Dell EMC is an "observer level" member. Since NVIDIA has a lion's share of the accelerator market it's a rather significant fact. It's quite glaring that NVIDIA is not a member of the CCIX consortium.

    The CCIX consortium contains many of the same players, but seems to have formed around Xilinx. The OpenCAPI consortium presumably formed around IBM. I haven't tried to figure out what the difference of their proposals are, yet.
  • JohanAnandtech - Friday, October 14, 2016 - link

    Thanks, a very good addition. I updated the article.
  • dave102 - Friday, October 14, 2016 - link

    It is pretty clear the author of this article does not know what he is talking about. These players are all individual contributors to the data center space and thus must come up with an open interconnect to allow they to play together at high enough performance. The claim that Intel must "catch-up" is total bullshit. Intel has 97% market share in the data center.

    Intel has a fabric product in omnipath, and guess what, it's integrated. They have an HBM solution, and it's integrated. They have a GPU competitor in the knights series, and guess what it's bootable and coherent with all of the products I just mentioned. The fact that all of these come standard on a Xeon or Xeon Phi makes the argument that they are "behind" for not coming up with a PCIe next generation standard totally irrelevant.

    Way to miss the boat on this one Johan.
  • JohanAnandtech - Friday, October 14, 2016 - link

    The point is that OpenCAPI has wide industry support, Omnipath is mostly Intel networking and MICs. So Intel has to answer this, despite having OmniPath. Intel excels in CPU design, but it is very unlikely they will have the best solution for every accelerator. In fact, as far as I know, Tesla is a lot more succesfull than Phi. And I do not know who ran over your cat today, but can we please keep the tone a bit more civil? Immediately stating "that I don't know what I am talking about" is not really a good way to start a discussion.
  • name99 - Friday, October 14, 2016 - link

    The correct analogy would be to something like what happened in the 90s as more and more tasks moved off the mainframe and onto servers. IBM was in the position you give for Intel today --- solutions to all sorts of things, with names like SAA and SNA, that today mean something to only a few cognoscenti.

    Having a portfolio of solutions is not a panacea. What matters includes things like total system cost, and the ability to mix and match alternative plugin components.
    One would expect that Intel is as aware of this history as IBM is (though sometimes I wonder, given the way they behave). Which means the outcome is not necessarily as determined as twenty years ago.
    BUT Intel has definite weaknesses. Their on-going fragmentation of the user base does them little favor (what's value in supposedly having binary compatibility between CPUs if the most relevant part of the ISA, the various AVX flavors, are not so compatible?) And they're going to become ever more vulnerable on cost. Not from IBM at the high end, but with a thousand cuts as Chinese POWER clones and ARM servers come online.
    (Remember, the Chinese government can't be sold high-end Xeons:
    http://www.pcworld.com/article/2908692/us-blocks-i...
    which means they are GOING to develop their own equivalents in this performance sector, regardless of the startup costs. And those startups are likely going to use ARMv8 or POWER ISA --- they're damn well not going to use x86 for both technical and business reasons. We've already seen the very first fruits of this with Phytium's initial offerings --- which are very much learning chips, not the end-point.)

    People analyzing this effort as a waste of time are living in the last ten years. They need to broaden their horizons both backwards (to how Intel got to its position in the first place) and forwards (not to next quarter, but to say the 2020 timeframe).
  • Michael Bay - Saturday, October 15, 2016 - link

    Thousand cuts by a lame duck must be made a new meme.
    And should POWER become something a little more tangible than IBM bullshit marketing, you can bet your everything at it being blocked for sale just as well.
  • eachus - Friday, October 14, 2016 - link

    There are several differences between these proposals and PCI Express. The first, and most important, is cache coherency. Keeping caches coherent does cost bandwidth, but it makes for much simpler code on each end, and net-net much faster communications.

    Also protocols are optimized around the size of information blocks transferred. Cache-coherency traffic will require some small quick transfers to verify consistency, but these protocols envision large block transfers once the handshaking is complete.

    The other major difference is addressing. For all of these new protocols, you need huge virtual address spaces, where part of the (physical) address tells which device manages the corresponding virtual addresses. Every device needs to be able to do virtual to physical mapping, potentially for the entire virtual address space. These maps, of course will be multilevel, and occasionally packets will have to be sent back and forth between the device decoding the virtual address, and the device which manages that part of the map. Not all devices need to manage virtual memory tables, but all will have TLBs (translation lookaside buffers) to do virtual to physical mapping quickly for most cases. (AKA locality of reference. Most code and data accesses a few small areas of addresses most of the time.)
  • zangheiv - Friday, October 14, 2016 - link

    This is of particular importance to AMD because it's ideal to have large amounts of Storage and Memory located on the GPU board closer to the respective compute source, that is coherently connected to the CPU which is coherently connected to other servers in the cluster. Dual-Zen servers will have upto 64 cores connected to several Radeon GPUs, So whether the architecture is for HPC, HW virtualization or micro-segmentaion, this open architecture makes very good sense.
  • johnpombrio - Friday, October 14, 2016 - link

    Does this remind anyone else of Micro Channel by IBM?
  • JohanAnandtech - Saturday, October 15, 2016 - link

    Micro channel was IBM's attempt to make part of the pc technology proprietary again. Exactly the opposite of OpenCAPI.
  • gobears99 - Saturday, October 15, 2016 - link

    I've noticed that Anandtech has yet to cover Intel's Xeon+FPGA systems. They play in the same space as CAPI (are there shipping CAPI solutions other than FPGAs???). Intel rolled out the first generation of the academic program for Xeon+FPGA over a year ago. They also presented details at a workshop held during ISCA2016. Slides available https://cpufpga.files.wordpress.com/2016/04/harp_i...
  • zodiacfml - Saturday, October 15, 2016 - link

    I feel this is just to ensure that there will be an interconnect available if the use of various types of accelerators simultaneously will prove popular.
    Intel seems to be in a wait and see strategy these days except for their non-volatile memory Xpoint. It is quite understandable considering they can just throw money at a problem.
  • zodiacfml - Saturday, October 15, 2016 - link

    Speaking of throwing money at a problem, they acquired Altera recently.
  • dangky3g - Wednesday, November 2, 2016 - link

    <a href="http://dangky3gvina.com/cach-dang-ky-3g-vinaphone.... ky goi cuoc 3g vina</a>
  • dangky3g - Wednesday, November 2, 2016 - link

    Huong dan dang ky 3g vinaphone >> http://dangky3gvina.com/cach-dang-ky-3g-vinaphone....

Log in

Don't have an account? Sign up now