Comments Locked

20 Comments

Back to Article

  • GeoffreyA - Friday, July 19, 2024 - link

    Excellent names! Wormhole and Blackhole.
  • PeachNCream - Sunday, July 21, 2024 - link

    Yeah, accurate too since humanity is sending an awful lot of power into the blackhole of AI. MS and Alphabet/Google both noted AI energy consumption has significantly increased their reliance on burning non-renewables and we're currently as a planet, consuming whole nation states' worth of power to run things we keep telling ourselves constitute AI but are really not independently intelligent.

    I suppose the alternative is letting foolish people drive around in excessively large personal transport vehicles and run 800+W computing devices for the sole purpose of playing video games. At least we're not doing that too...oh wait, nevermind.
  • GeoffreyA - Sunday, July 21, 2024 - link

    All that matters to these corporations is making dollars, and at present, the road to riches is AI Street. So what if an inordinate amount of power is used in the process? We'll sweep that under the carpet and later greenwash it. "Because we care" (audience applauding).

    On a serious note, I think the exorbitant energy use of current deep neural networks is a barrier (downplayed of course). It's far behind the brain in this regard. Nature chose a biological, analogue implementation, using far less energy and doing much more. I do not doubt the breakthrough will use such an approach, getting ever nearer to the brain's efficiency. These are the MS-DOS versions of Windows 10.
  • boozed - Sunday, July 21, 2024 - link

    They *think* the road to riches is AI. It's eating amazing amounts of VC and could very well turn out to be a black hole.
  • GeoffreyA - Monday, July 22, 2024 - link

    Exactly, they *think* it is. It could well fizzle away before we know it. Maybe cloud will make a return to the limelight.
  • Santoval - Friday, July 26, 2024 - link

    Neuromorphic processors have existed for about a decade now. They are about a thousand times more energy efficient, massively parallel, and lack a Von Neumann bottleneck since logic memory and I/O are all tightly intertwined - much like in the human brain.

    Their energy efficiency and complexity lies somewhere in the middle between primate brains and Von Neumann processors, and they look like the hardware analogue of software AI/LLMs.

    So they should be ideally suited for them. And yet every AI company uses Nvidia's obscenely energy inefficient GPUs. Why? Due to addiction to CUDA?
  • GeoffreyA - Saturday, July 27, 2024 - link

    Thanks for the description. You've given me something to read about over the weekend. The concept of neuromorphic processors is what I was vaguely thinking of. It seems remarkable to me that the technology exists---I had thought it decades away---but is being overlooked in favour of the same-old way of thinking. As for CUDA, it is regrettable that so many machine-learning libraries rely on it, and is in nobody's interest but Nvidia's.

    I would be glad to see neuromorphic processors gaining traction. I believe today's LLMs are the primitive analogues of parts of our brain. If these could be extended, joined with the neuromorphic system, and consciousness cracked, doubtless it would lead to strong AI. Put differently, I think we are the same system but at an advanced stage.
  • Santoval - Saturday, July 27, 2024 - link

    An additional reason for their very slow adoption might be that they have a different programming paradigm, which AI developers are not familiar with. There are no conventional back & forths between memory and logic after all.

    Imagine *all* memory being at the L0 cache / register level of conventional processors and being just as fast. It doesn't look like OpenAI, at least, is comfortable with their dependence on Nvidia. Otherwise they would not have developed Triton to compete with CUDA.

    But Triton is still at an early stage of development, and OpenAI initially released it only with Nvidia support. But if it takes off it will be a viable competitor to CUDA, and cross-platform too.

    Despite being Python like OpenAI claims it's at least just as fast as CUDA.
    The only other cross platform solution is OpenCL, but how many use that for AI vs CUDA?
  • GeoffreyA - Saturday, July 27, 2024 - link

    Machine learning is hard as it is and programmers are only now coming to terms with it. A new paradigm needs time and major benefits, and the present approach is the path of least resistance.

    I doubt whether, but hope, Triton will usurp CUDA's place. We need a real competitor, that's for sure. CUDA was just too good and accessible: Nvidia built the ecosystem, the libraries, the tools, and everything was just there and ML began gravitating towards it. OpenCL seems to be going the way of the dinosaur: one doesn't see it much any more. I have seen Vulkan used, in Real-ESRGAN, but this isn't common. There is a library that translates CUDA for AMD and Intel GPUs but is neither complete nor the answer.
  • charlesg - Monday, July 22, 2024 - link

    Last I checked, it's a free world.

    One large difference between you and I, is I couldn't care less about what you think or do.

    Unless you desire to force your distorted viewpoint on others, which based on your regular posts, appears that way.
  • flyingpants265 - Monday, July 22, 2024 - link

    This is a purely meaningless post.
  • Dante Verizon - Friday, July 19, 2024 - link

    What is the chip's lithography? That doesn't seem efficient to me compared to mi300/H100
  • mode_13h - Friday, July 19, 2024 - link

    The board-level specs are here: https://tenstorrent.com/hardware/wormhole

    The chips are detailed here: https://www.semianalysis.com/p/tenstorrent-wormhol...

    According to that, the chips are 12 nm.
  • Dante Verizon - Friday, July 19, 2024 - link

    Not bad for a 12nm chip. At that price, it should appeal to some niche consumers.
  • nandnandnand - Friday, July 19, 2024 - link

    How does it compare to an RTX 4090 for the operations both support? Because at those prices and being a PCIe card, consumers could use it in their desktops.
  • mode_13h - Saturday, July 20, 2024 - link

    The RTX 4090 is rated at 292/330 fp16 TFLOPS (base/boost; non-sparse) @ 450W.
    The Tenstorrent n300s is rated at 131 fp16 TFLOPS @ 300W and sells for $1400.
    Both have 24 GB of on-board GDDR6.

    So, it's not really competitive, but then I guess the silicon dates back to 2021 and matched up pretty well against the RTX 3090. The main reason to go with Tenstorrent is probably as a development vehicle, in preparation for their future chips.
  • Terry_Craig - Sunday, July 21, 2024 - link

    "By contrast, Nvidia's H100 supports FP8 and its peak performance is massive 1,670 TFLOPS (3,341 TFLOPS with sparsity) at 300W, which is a big difference from Tenstorrent's Wormhole n300."

    H100 TDP's 700w > https://www.nvidia.com/en-us/data-center/h100/ *The FP8 numbers mentioned on the website are using sparsity.

    Plus, It is worth remembering that the theoretical number is not always achieved in practice, especially when it comes to GPUs:

    "I noticed in CUDA 12.1 update 1 that FP8 matrix multiples are now supported on Ada chips when using cuBLASLt. However, when I tried a benchmark on an RTX 4090 I was only able to achieve 1/2 of the rated throughput, around ~330-340 TFLOPS. My benchmark was a straightforward modification of the cuBLASLt FP8 sample 124 to use larger matrices, run more iterations and use CUDA streams. I primarily tried N = N = K = 8192, but other sizes had similar behavior. I tried this with both FP16 and FP32 output and got the same result, although I was only able to use FP32 for the compute type as this is the only supported mode in cuBLASLt right now.

    My result is quite far off from the specified 660 TFLOPs in the Ada whitepaper 211 for FP8 tensor TFLOPs with FP32 accumulate. Is there a mistake in the white paper, or is there some incorrect throttling of FP8->FP32 operations going on (much like how FP16 → FP32 operations are half-rate on GeForce cards)?" >

    https://forums.developer.nvidia.com/t/ada-geforce-...
  • mode_13h - Monday, July 22, 2024 - link

    > Is there a mistake in the white paper, or is there some incorrect throttling of
    > FP8->FP32 operations going on (much like how FP16 → FP32 operations
    > are half-rate on GeForce cards)?"

    I had this exact thought. I doubt the excuse made by the Moderator that it's being power/thermal-limited, since the specified performance at base clocks is just about 11.5% less than boost, not half!
  • Rudde - Thursday, July 25, 2024 - link

    Why not also quote the follow-up post that mentions Nvidia has corrected the whitepaper numbers? https://forums.developer.nvidia.com/t/fp8-fp16-acc...
  • SanX - Tuesday, July 30, 2024 - link

    Price is good but everything else is not even close to 4 years old A100. Better name for it would be Ahole

Log in

Don't have an account? Sign up now