-
Interesting, my mind immediately went to block diffusion [0], but I think you are probably right.
[0] https://m-arriola.com/bd3lms/
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
diffusion-vs-ar
[ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"
-
https://github.com/KellerJordan/modded-nanogpt
Also, I read yesterday that at some point, the embeddings across all of the models are (very) comparable/similar, and a simple converter can be trained, and if both of these statements are true maybe we could just train everything much faster just by sharing fixed embeddings and attentions.
-
d1
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
Obviously its not at the scale of the top auto-regressive models yet but there are some OSS models https://github.com/dllm-reasoning/d1