Comments Locked

10 Comments

Back to Article

  • Vlad_Da_Great - Monday, November 16, 2015 - link

    It seems INTC is attacking hard the HPC and Co-Processor market. NVDA is in trouble. "The downside of the flat mode means that the developer has to maintain and keep track of what data goes where, increasing software design and maintenance costs." Duh, that is why the libraries are developed and developer doesnt need to know anything, just pull the strings and drop the require or include.
  • ABR - Tuesday, November 17, 2015 - link

    I wouldn't put it so much as Nvidia is "in trouble" as Nvidia has some competition. It's going to get more interesting though. Both are coming into this relatively smaller market from different directions, each adapting their hardware from its original purposes. Intel from above, Nvidia from below, each working to build the software support around it. AMD continues to work its angle with heterogenous systems. Despite x86 it's actually Nvidia with the early lead on the software side, but they'll have to work to preserve it. Interesting times.
  • BurntMyBacon - Wednesday, November 18, 2015 - link

    @ABR: "Despite x86 it's actually Nvidia with the early lead on the software side, but they'll have to work to preserve it."

    I generally agree with your post accept I'm not so sure about this. There are many supercomputers out there with nothing but Intel Xeons or AMD Opterons. They've been out much longer than the Tesla processors. I'd guess that Intel is further along on the software side. There is some work to be done to adapt code from many complex cores to many simple cores, but they are all OoO x86 cores with largely the same features. I would think that these adaptations would be easier than adapting for CUDA and its similar changes and caveats for successive generations of GPUs. Their is no denying that nVidia has done a good job building up their software, though. Limiting to just CUDA/OpenCL, I have no doubt that their ecosystem is more rubust.

    I think that AMDs heterogeneous vision is ideally the best, but AMD is having a hard time converting that vision into a desirable product. Given that HSA isn't closed (ARM vendors are planning to use it) nVidia and Intel could have already capitalized on AMD's vision if they weren't so preoccupied with making sure that their technologies remain proprietary. I guess standardization and interoperability don't generate sales in today's market.
  • ABR - Wednesday, November 18, 2015 - link

    @BurntMyBacon : I also pretty much agree with the points you make. The question though is where exactly is the sweet spot of that market they are defining and trying to move into? It actually isn't traditional supercomputing, even though Nvidia is making efforts there. Right now it's smaller scale stuff growing out of people wanting to do more with their workstations now that CPU speeds have stagnated. The scientists, the creatives, crypto guys, a groundswell of people picking up GPUs to get those order of magnitude improvements that used to come biannually in PCs. This is where CUDA has been strong. Few have the budget to build and maintain a "Beowulf" cluster let alone something bigger, but slapping a few Tesla cards into a box has been a lower barrier to entry. Low enough to justify writing software and letting an ecosystem grow. And the data center / cloud GPU compute trends are growing out of this same category of users. The cloud is most alluring to those wanting to stretch a thin budget far. This is the growth area being fought for.
  • patrickjp93 - Saturday, November 21, 2015 - link

    Intel's tech isn't proprietary. OpenMP has been around since 2000 and went open standard in the early 2000s. It's the bread and butter of intel-based supercomputers today. It's also vastly superior to HSA implementations in programmability.
  • patrickjp93 - Saturday, November 21, 2015 - link

    I'm sorry but Intel definitely has the software ecosystem advantage. Intel-based supercomputers have been built around OpenMP for years. Do you know how easy it is to adapt code to send it to the Xeon Phi? You literally wrap it with 1 line of header and two braces.

    #pragma offload target(mic) map(some_vector[i:N]) in(result_vector){
    //the same openmp/C++ parallel code you had in here before
    }
  • vFunct - Tuesday, November 17, 2015 - link

    This + 32GB of MCDRAM + several TB of X-Point would make a killer cloud server.
  • bds71 - Tuesday, November 17, 2015 - link

    replace 24 of those cores with GPU cores; and, (when stacked RAM becomes available) stack XPoint on top of MCDRAM. you now have CPU, GPU, RAM, and Storage on a single chip = complete PC!!
  • questionlp - Tuesday, November 17, 2015 - link

    "As the diagram stands, the MCDRAM and the regular DDR4 (up to six channels of 386GB of DDR4-2400) are wholly separate, indicating a bi-memory model."

    That should be 384GB, not 386GB
  • iAPX - Friday, November 20, 2015 - link

    16GB is good for small to medium-sized vectorized problems, 384GB is an huge step forward for highly parallelized architectures, still the low bandwidth of DDR4 and the implicitly sequential nature of x86 code processing on Knight Landing (compared to GPU that will launch MANY pseudo-threads to try to hide memory latencies) will creates bubbles in the pipelines (huge bubbles!) if there are cache miss or direct DDR4 accesses.

    The DDR4 itself with it's 6 controllers is a low-bandwidth <100GB/s memory, compared to actuel graphic cards, in fact it's playing against 99$ and undr graphic cards!

    Question is, will 16GB of MCDRAM be enough, and it could be as nearly all GPGPU sub-systems doesn't have as much memory, or will programmers use the huge available DDR4 memory?
    I think the answer is in th hands of the developers.

Log in

Don't have an account? Sign up now