Comments Locked

15 Comments

Back to Article

  • aryonoco - Tuesday, August 21, 2018 - link

    So let me get this straight:

    This is an in-order CPU that targets high frequencies, and the designers think they can overcome the in-order's limitation by clever compiler? Somehow they'll be able to do what Intel never achieved with Itanium?

    And, it's a new ISA, so you have to port everything to it to start with?

    And they think QEMU emulation is going to solve their problems?

    What are they smoking?! And if these people can get VC money for their crazy pipe dreams, why am I not living in a 19 bedroom mansion?
  • eastcoast_pete - Wednesday, August 22, 2018 - link

    Don't know the answers to your questions, except the last one. Here they are: 1. Because you need to make even more outrageous claims in a colorful slide set to pitch your stuff, and 2. Before said pitch, botox your facial muscles, so you won't smirk or smile when you ask for a gazillion dollars in funding.
    Let me know how that works out (:
  • SarahKerrigan - Tuesday, August 21, 2018 - link

    Man, that functional unit config is weird. 1 LSU, 1 LU, 1 SU? What the heck kind of workload will ever issue two stores and one load?

    I also take exception to their criticism of IPF; IPF *did* encode dependent operations, albeit indirectly, using the templating mechanism (an IPF instruction group can be indefinitely wide, as long as it has no internal dependencies; when you run into dependencies you stick in a template that defines a stop.)

    To me, this looks like the child of IPF and Power6 (mainly for the runahead parts) - neither of which were incredibly effective microarchitectures at general-purpose workloads through their lifetimes - but I'm willing to be convinced.
  • name99 - Thursday, August 23, 2018 - link

    I think you people are all missing the point.
    The words that matter here are "We are not VLIW, every node defines sub-graphs and dependent instructions"
    This is an Explicit Data Graph Execution design
    https://en.wikipedia.org/wiki/Explicit_data_graph_...

    MS is researching the exact same sort of thing.

    One problem I *think* exists with such designs is that they are not easily modified forward (ie it's difficult to maintain binary compatibility while improving the design each time your process allows you to use more transistors).
    BUT I think their assumption is that binary compatibility has had its day -- any modern code (especially in the HPC area they seem to be targeting first) is in source, distribution via floppy disks and CDs is dead, and limiting your design so you can keep doing things the way they were done in the 1990s is just silly.

    (This was, of course, the same sort of premise behind Java and .NET.
    Those two "failed" [relative to their ambitions] in different ways, but JS succeeded...
    Apple seems to be experimenting ever so slowly with the same sort of idea. Right now it's somewhat unambitious --- some post-submission touch-up in the iOS store, and submitting apps in some sort of IR to the Watch store.
    Clearly this, consumer, level of abstraction is a harder problem because developers don't want to distribute source, even to the app store; and there are more variations and more things that can go wrong with consumer distribution of an IR. We don't even know how robust Apple's Watch IR will be to when the aWatch goes 64 bit [presumably this year?]

    But I think it's becoming ever more clear the win that's available if you are not forced into endless backward compatibility --- think how fast GPUs have been able to evolved because they do not have this constraint. Which means it's just a question of which companies are smart enough to be the first to exploit this...

    Perhaps it will be Tachyum in the HPC space, I have no idea.

    I expect it will be Apple in the consumer space. MS had their chance, had everything lined up with .NET, but being MS they, of course..., fscked it up because, 15 years after CLR's first release, they STILL can't decide how much to commit to it.

    Apple may discover that their current IR is sub-optimal for truly substantial ISA updates; but they are flexible and can and will change. For example they might switch to using SIL as the preferred distribution IR, something that will get ever more feasible as Swift fills out its remaining missing functionality.)
  • Elstar - Sunday, August 26, 2018 - link

    The IR Apple uses is just LLVM IR, which is a fairly transparent IR. You can’t take IR generated by clang or Swift and magically switch endian modes, or pointers size for example, because the ObjC++ or Swift code can generate different ABI data structures depending on the endianness or pointer size. Even if these problems are solved, most vector code is ISA specific and therefore ISA specific LLVM IR is generated. You could try and translate one vector ISA model to another ISA vector model, but the results would never be as fast as people would want or expect.
  • Elstar - Tuesday, August 21, 2018 - link

    This feels like the classic "fake it until you make it" silicon valley presentation. I will doubt that most of the promises made in this presentation will ever ship.
  • webdoctors - Wednesday, August 22, 2018 - link

    I previously designed something similar, and the difficult part was being able to hit high frequencies due to the timing critical datapath. 1 GHz was feasible, but 4......

    The overhead of the translation between ISAs also caused a perf hit so the published numbers are very suspicious.
  • eastcoast_pete - Wednesday, August 22, 2018 - link

    I agree that it would be more than just a little bit assuring if they had taped out a small run of anything similar to this, even in 22 nm, and not hitting 4 Ghz; let it hit >2 Ghz in early prototype, that'd be fine. Predictions and simulations are subject to challenge by reality.
  • V900 - Wednesday, August 22, 2018 - link

    “Faster than Xeon” and “smaller than ARM” sounds a little too good to be true.

    What architecture is this? An ARM respin or a brand new one?
  • eastcoast_pete - Wednesday, August 22, 2018 - link

    Interesting design, even after the hype-discount (faster than Xeon, smaller than ARM?). Problem with such designs tends to be the dreaded compiler. From what I can gather, that chip will live or die by its compiler, as that is supposed to allow out-of-order execution on this in-order design.

    @Ian: Some questions: I know this is "Hot Chips", but have they/will they present at "Hot Compiler" or anything similar? What is known about the compiler?
    Tachyum made/makes a big deal about their AI prowess. Did they say what makes their chip so much better for that than existing solutions? And, by better, I don't just mean faster, but performance/Wh.
    Lastly, has TSMC confirmed that Tachyum has signed a contract with them, or is it all vaporware/"we plan to.." right now?
  • DieWurst - Wednesday, August 22, 2018 - link

    I ran out of popcorn half-way through the presentation, it was that good.
  • nils_ - Thursday, August 23, 2018 - link

    This sounds extremely ambitious, and frankly to good to be true. Where is the code? Where are the patches for Linux and the compilers?
  • name99 - Thursday, August 23, 2018 - link

    They have been hiring LLVM and GCC engineers. You can easily verify this with a web search. I expect they have not published the patches for the obvious reason that they were (and remain) essentially secret; they're not yet ready to give details.

    This doesn't mean they're legit; but their behavior is hardly unusual in this respect. Hell, good luck finding a machine model for an Apple design more recent than the A7 in the public LLVM repository...
  • nils_ - Friday, August 24, 2018 - link

    This is the classic trap that many SoC vendors fall into by not submitting their patches early enough. To have the CPU supported by Linux / GCC they should have been submitting patches already, at least for a 2019-2020 release, and then get one or two popular distributions on board (which themselves are sometimes extremely slow on the update when it comes to new kernels).
  • bananaforscale - Saturday, August 25, 2018 - link

    "In a Facebook 100MW datacenter, 442k servers. 40% idle means 265k idle servers per day"

    No, 40% of 442k is ~177k. 265k would be the number of *active* servers.

Log in

Don't have an account? Sign up now