Comments Locked

42 Comments

Back to Article

  • Yojimbo - Monday, February 3, 2020 - link

    Habana was just purchased. The correct interpretation is not that Nervana was folded to whittle down a fractured portfolio, but rather that the whole reason Habana was purchased was because of the failure of Nervana.
  • jeremyshaw - Monday, February 3, 2020 - link

    The next question is how long will it take before Intel makes a failure of Habana?
  • p1esk - Tuesday, February 4, 2020 - link

    Probably as long as it takes Nvidia to ship next gen Tesla cards.

    10x100GE does look insane though! Anyone has a clue how it could possibly be used?
  • Yojimbo - Tuesday, February 4, 2020 - link

    When training very large models there is a lot of cross-talk. NVIDIA makes fat nodes, but if models need to scale bigger than a few nodes then there is an internode bandwidth limitation. However, it's been a well-known issue for years and something NVIDIA has been working on in their software stack. I have a feeling that one of the key new features of NVIDIA's next generation data center GPUs is better scalability for AI training with large clusters.
  • p1esk - Tuesday, February 4, 2020 - link

    How would implement the physical 10x100GE connections? Like bridges? 10 fiber cables? Custom motherboard?
  • Eliadbu - Tuesday, February 4, 2020 - link

    There is an implementation for rack with 8 of those accelerators called HLS-1 it's similar to NVIDIA DGX. each accelerators is connected to other accelerator in the node so 7 ports are used for inter node connection and the other 3 ports of each accelerator are routed for connection outside of the node to connect to a switch or different nodes. This is just one implementation, since those cards use common Standard it is really up to the customer to choose how to implement those cards.
  • Yojimbo - Tuesday, February 4, 2020 - link

    There are ports coming off the server. I guess the configurators can use fiber or copper or whatever they think is appropriate. According to their website, in the HLS-1 server each Gaudi HL-205 Mezzanine card uses 7 of the 100Gb ethernet connections for in-node communications. Each one has 3 connections to ports coming off the server. So the total ports coming off the node is 24x100Gb. When they expand to multiple nodes in the cluster they use ethernet switches to lash together the nodes through the ethernet connections coming off the nodes. There is no CPU in the server so it needs to be connected somehow to one. I don't know how that connection is made, whether through the PCI express switches or through ethernet. They have a graphic showing 6 HLS-1, 6 CPU servers, and 1 ethernet switch for a full rack.
  • p1esk - Tuesday, February 4, 2020 - link

    I wonder what kind of ethernet switch can handle a full rack of servers pumping 2.4Tbps each. I suspect a switch like that costs more than the entire rack.
  • Kevin G - Tuesday, February 4, 2020 - link

    Not entirely sure that that is the exact direction nVidia is going.

    Several nvSwitch ASICs lets upward of 32 GV100's communicate within two hops of each other in a fully coherent fashion as a single node. Per IBM's road map, next generation of nvLink is also around the corner which boosts bandwidth and is a clear indication that Ampere will also support it at the high end.

    nVidia did purchase Mellanox so they do have some plans beyond just nvLink for communication but anything integrated from that acquisition is likely years away still. (I would still expect some portion of the Mellanox roadmap to continue short term.)
  • HStewart - Tuesday, February 4, 2020 - link

    You ever thought that Intel is combining what best of Habana and Nervana to make something completely new.
  • Korguz - Tuesday, February 4, 2020 - link

    ever thought that nervana sucked.. intel realized it... so instead of trying to fix it.. they instead just buy someone else.. and continue on with that. come on hstewart... not every thing intel does is gold.. even your beloved intel screws up with new tech... the bad part of this.. is the undustry lost 2 independant compaies im this area...
  • HStewart - Wednesday, February 5, 2020 - link

    Lets say it was a different company and say AMD purchase it and AMD decide to switch companies as this - I almost quarantine that you korquz made a smart decision for the industry. Even though same thing happen - just if it is Intel - it is bad for industry if others it is not.
  • Korguz - Wednesday, February 5, 2020 - link

    " i almost quarantine " ???? what does that have to do with this ??
    yes.. but the same thing can be said about you.. if amd did this.. you would be all over this saying how much amd screwed up... but you dont appear to be doing that here with intel.. seems to me.. you think its ok that intel did this... and the industry would still have lost 2 indepenant companies..
  • m53 - Tuesday, February 4, 2020 - link

    Intel was the largest investor for Habana. But now it owns 100% share. Here is an article from 2018:

    https://www.prnewswire.com/news-releases/habana-la...
  • mode_13h - Tuesday, February 4, 2020 - link

    At least, with the cancellation of Knights Mill, Intel seems to have gotten over its not-invented-here syndrome.

    That connectivity is pretty crazy, though. Did they license someone else's Ethernet IP? That would be a good argument in favor of it.
  • zmatt - Tuesday, February 4, 2020 - link

    Unless my math is wrong, that 10x 100GbE exceeds the total bandwidth of the 16x PCIe 4.0 bus by a fair margin. How are you supposed to actually use all of that bandwidth?
  • e1jones - Tuesday, February 4, 2020 - link

    Most likely because it all stays on the card and that bandwidth is to scale to other cards/nodes.
  • Yojimbo - Tuesday, February 4, 2020 - link

    7 of those connections are used for internal node communication. And the data doesn't go over the PCI express bus. they use RoCE RDMA over the ethernet to move the data around.
  • mode_13h - Wednesday, February 5, 2020 - link

    As others have said, it's for inter-node connections. Probably in a fashion comparable to NVLink.

    https://en.wikipedia.org/wiki/NVLink

    Nvidia's Tesla V100 has 6 links @ 25 GB/sec per link, per direction. So, aggregate of 150 GB/sec per chip, per direction.

    By comparison, 10x 100 Gb/sec = 125 GB/sec per direction. So, almost on par with Nvidia's datacenter/HPC GPU from 2.5 years ago. Just to put it in perspective.
  • peevee - Tuesday, February 4, 2020 - link

    Why did they buy Habana when they have already had Nervana?

    Somebody at Intel got a nice kickback.

    Intel is a headless chicken.
  • III-V - Tuesday, February 4, 2020 - link

    >Why did they buy Habana when they have already had Nervana?

    IP
  • HStewart - Tuesday, February 4, 2020 - link

    To combine the best of both technology, it probably going to be a new product and eventually people will think they kill also Habana. I don't work for Intel so I am not sure - but that seams like logical thing to do.
  • Eliadbu - Tuesday, February 4, 2020 - link

    simply because Nervana wasn't making the cut, other companies and engineers probably told intel nervana offering wasn't good enough so they had 2 choices either try and improve the product which will take years or purchase something already available that is superior. second option is most reasonable because AI hardware is improving rapidly taking the bet you will outperform your rivals (Nvidia) with future product while your current offering is inferior is very risky. will they kill habana that depends on how its products will perform in the next few years but Intel won't have many more chances to buy another company because as time goes on fewer companies will keep up and it will be very hard for new players to get in the competition.
  • HStewart - Tuesday, February 4, 2020 - link

    Still they can always use the technology of both companies to come up with a new product. They did this the PLA stuff and came up with EMiB and Foroves.
  • Yojimbo - Tuesday, February 4, 2020 - link

    On the other hand, they bought both Cray's Dragonfly and QLogic and tried to combine them and ended up killing both of them.
  • HStewart - Wednesday, February 5, 2020 - link

    That is not the same, Cray did not purchase Dragonfly - Cray uses Intel cpus - which latest version upgrade to use Aurora - AMD has a similar supercomputer system in the mix - very old news

    https://www.eweek.com/servers/cray-opts-for-intel-...
  • Yojimbo - Wednesday, February 5, 2020 - link

    Cray didn't purchase Dragonfly, Intel purchased it from Cray. Well actually they purchased Gemini and Aries. Dragonfly was just the name of the topology, I used the wrong name. But the point is that Intel purchased stuff from Cray, they purchased QLogic, and they tried to combine them, or so they claimed, and create Omnipath. Omnipath failed and they ended up killing both the Qlogic infiniband business and the Cray interconnect business.

    BTW, the Aurora machine is an Intel machine, not a Cray machine. Intel is the primary contractor, and they use Cray's new interconnect that Cray developed to replace the Gemini and Aries that was sold to Intel.

    In any case, the example is very much appropriate to what people suggested about Intel combining technologies. In reality, though, Habana is a product that's already very far along, further than Nervana was, and it's very unlikely that anything from Nervana ends up in Habana products. Nervana was just a failed venture.
  • Spunjji - Wednesday, February 5, 2020 - link

    It's a little naive to expect that there's any meaningful way they could "combine the best of both".

    At best, I expect they'll hang on to Nervana's IP to block off competitors.
  • Kevin G - Tuesday, February 4, 2020 - link

    How much of this was the failure of Nervana and how much of it was failure of Intel to get 10 nm based Nervana chips out the door in a timely fashion? Would the Nervana chip had been cancelled if 10 nm was on track as originally forecast (ie we'd be on the eve of 7 nm right now)?

    While entirely de-emphasized, Intel did promote the idea of a Xeon chip with an on package Nervana accelerator at some point in the future. Considering that this idea was quietly swept under the rug months ago, safe to say that this was previously kill but signs pointed squarely to fab issues rather than design (how many years ago did Ice lake tape out?).
  • Spunjji - Wednesday, February 5, 2020 - link

    They haven't even pushed out the TSCM 16nm+ version of Nervana's accelerator, and the effort involved in changing to an entirely different foundry's process is not insignificant, so I'd argue it's likely that Intel's fab woes are unrelated.
  • m53 - Tuesday, February 4, 2020 - link

    "Nvidia finest AI chip, the V100 GPU, manages something over 3,247 images/second at 2.5ms latency (which drops to 1,548 images/second if it wants to match Goya’s self-reported 1.3ms latency)."

    https://www.nextplatform.com/2019/01/28/ai-chip-st...

    "Two Nervana NNP-I chips achieve 10,567 inputs per second in ResNet-50, according to third-party benchmarks. Within the same power bracket, one Habana Goya reaches 14,451 inputs per second."

    https://www.techspot.com/news/83826-intel-sacrific...
  • p1esk - Wednesday, February 5, 2020 - link

    Unlike Nervana/Havana V100 can do a lot more than deep learning, considering its FP64 capability.
  • mode_13h - Wednesday, February 5, 2020 - link

    True. Also, the tensor cores in the Turing series can do int8 inferencing at double the throughput of the V100's, which are fp16-based.

    I think Nvidia's bread-and-butter GPU for cloud-based inferencing is the TU104-based Tesla T4 - not the V100.
  • mode_13h - Wednesday, February 5, 2020 - link

    BTW, I know the Turing Tensor cores can *also* do fp16, but the V100's Tensor cores can't do int8.
  • m53 - Thursday, February 6, 2020 - link

    @p1esk: Well CPUs are way more flexible than a GPU but still there is a big market for GPU for the better perf/watt in highly parallel workloads like graphics processing, game streaming, etc. Similarly AI chips are specialized in AI workloads and perform much better for same power usage than a GPU. Many hyperscalers are currently using Nvidia GPUs for such AI workloads since there wasn't many choice for dedicated AI hardware. Facebook is a good example which currently use Nvidia GPUs for AI workloads like face detection and tagging but now looking for dedicated AI hardware. Google used to use Nvidia hardware for AI inference but has moved to TPUs. So yes, dedicated AI hardware has a big market where flexibility doesn't matter. That's the target market for these Habana chips.
  • p1esk - Friday, February 7, 2020 - link

    That's a good point, if you're talking about hardware only. However, the software aspect is just as important, and as far as software, Nvidia is far ahead of competition. Google is well positioned because it already had a popular DL software platform (Tensorflow), so it made sense for them to build its own hardware, targeting their software. Intel has a much harder task - it needs to earn trust, meaning that not only it must produce much faster hardware than competition (Tesla Ampere), it must also deliver rock solid DL frameworks support for its cards, and somehow I'm not holding my breath. "Slightly faster" cards than Nvidia coupled with mediocre software is just not gonna fly.
  • mode_13h - Friday, February 7, 2020 - link

    Look at how much of a TU104 die is occupied with Raytracing cores and other GPU-specific hardware blocks. Among other things, this overhead makes Nvidia more expensive (unless they decide to fab a chip without it, that's just tensor + CUDA cores).
  • mode_13h - Friday, February 7, 2020 - link

    And NVDEC blocks - one benefit GPUs have over most AI chips is hardware-acceleated decoding.
  • m53 - Sunday, February 9, 2020 - link

    "Google is well positioned because it already had a popular DL software platform (Tensorflow)" --> Well tensorflow is open source. So others can use it too.

    ""Slightly faster" cards than Nvidia coupled with mediocre software is just not gonna fly." --> Agreed. But from the benchmarks it is not slightly faster. It is order of magnitude faster for same watt and I expect it to be far cheaper since it doesn't have any extra unused hardware. Also AI is a nascent market. It will evolve drastically in the next few years. The software situation will also change dramatically. I don't see a general purpose GPU to keep lion's share in this evolving market.
  • mode_13h - Wednesday, February 5, 2020 - link

    What's funny about this is that ResNet is pretty small, by comparison with networks that a lot of people are using, these days.

    However, the really silly thing is that they're comparing at small batch size. Who cares about small batch size? Realtime/embedded (i.e. robotics, self-driving, etc.) - not cloud. AFAIK, Habana's products are all aimed at the cloud. In the cloud, you're dealing with big enough flows of data that you can afford a decent batch size, making that comparison fairly unrealistic.
  • Altuzza - Wednesday, June 2, 2021 - link

    To manage different business processes, you need a set of business management software, such as, Sales Expertise and Sales Services by Salesborgs.ai https://www.salesborgs.ai/services/ , and cloud backup & recovery software. You need software to manage your employees, clients, customers and vendors efficiently, and manual efforts are not sufficient to track every detail.
  • Altuzza - Monday, July 19, 2021 - link

    Tools or even data are no longer stored in one place. Some basic data still remains in the backend: prices, sales, catalog. But PIM (Product Information Management), CRM (Customer Relationship Management) and even Inventory Management System may not be integrated into the backend. Thanks to the headless architecture, many business tasks can be delegated to independent software products, often cloud-based, see for yourself - https://digitalsuits.co/blog/the-key-benefits-of-h...

Log in

Don't have an account? Sign up now