Non-determinism in GPT-4 is caused by Sparse MoE

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. petals

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

    Could this work well with distributed solutions like petals?

    https://github.com/bigscience-workshop/petals

    I don't understand how petals can work though. I thought LLMs were typically quite monolithic.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. tensorflow

    An Open Source Machine Learning Framework for Everyone

    Right but that's not an inherent GPU determinism issue. It's a software issue.

    https://github.com/tensorflow/tensorflow/issues/3103#issueco... is correct that it's not necessary, it's a choice.

    Your line of reasoning appears to be "GPUs are inherently non-deterministic don't be quick to judge someone's code" which as far as I can tell is dead wrong.

    Admittedly there are some cases and instructions that may result in non-determinism but they are inherently necessary. The author should thinking carefully before introducing non-determinism. There are many scenarios where it is irrelevant, but ultimately the issue we are discussing here isn't the GPU's fault.

  4. curated-transformers

    🤖 A PyTorch library of curated Transformer models and their composable components

    Yeah. In curated transformers [1] we are seeing completely deterministic output across multiple popular transformer architectures on a single GPU (there can be variance between GPUs due to different kernels).

    One non-determinism we see with a temperature of 0 is that once you have quantized weights, many predicted pieces will have the same probability, including multiple pieces with the highest probability. And then the sampler (if you are not using a greedy decoder) will sample from those pieces.

    In other words, a temperature of 0 is a poor man’s greedy decoding. (It is totally possible that OpenAI’s implementation switches to a greedy decoder with a temperature of 0).

    [1] https://github.com/explosion/curated-transformers

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Build, Innovate & Collaborate: Setting Up TensorFlow for Open Source Contribution! 🚀✨

    1 project | dev.to | 3 Nov 2024
  • Create ML models that can run in any environment

    1 project | news.ycombinator.com | 3 Oct 2024
  • TensorFlow: Democratizing Machine Learning with Open Source Power

    1 project | news.ycombinator.com | 5 Aug 2024
  • Release TensorFlow 2.17.0 · TensorFlow/TensorFlow

    1 project | news.ycombinator.com | 24 Jul 2024
  • Side Quest Devblog #1: These Fakes are getting Deep

    3 projects | dev.to | 29 Apr 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?