Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Here is my Base64 encoder shader:
https://github.com/Rezmason/excel_97_egg/blob/main/glsl/base...
I got it down to about thirteen lines of GLSL:
https://github.com/Rezmason/excel_97_egg/blob/main/glsl/base...
I use it for Cursed Mode of my side project, which renders the WebGL framebuffer to a 640x480 indexed color BMP, about 15 times per second:
https://rezmason.github.io/excel_97_egg/?cursed=1
There's some additional interesting details, and a surprising amount of variation in those details, once you start really digging into things.
If the length of your input data isn't exactly a multiple of 3 bytes, then encoding it will use either 2 or 3 base64 characters to encode the final 1 or 2 bytes. Since each base64 character is 6 bits, this means you'll be using either 12 or 18 bits to represent 8 or 16 bytes. Which means you have an extra 4 or 2 bits which don't encode anything.
In the RFC, encoders are required to set those bits to 0, but decoders only "MAY" choose to reject input which does not have those set to 0. In practice, nothing rejects those by default, and as far as I know only Ruby, Rust, and Go allow you to fail on such inputs - Python has a "validate" option, but it doesn't validate those bits.
The other major difference is in handling of whitespace and other non-base64 characters. A surprising number of implementations, including Python, allow arbitrary characters in the input, and silently ignore them. That's a problem if you get the alphabet wrong - for example, in Python `base64.standard_b64decode(base64.urlsafe_b64encode(b'\xFF\xFE\xFD\xFC'))` will silently give you the wrong output, rather than an error. Ouch!
Another fun fact is that Ruby's base64 encoder will put linebreaks every 60 characters, which is a wild choice because no standard encoding requires lines that short except PEM, but PEM requires _exactly_ 64 characters per line.
I have a writeup of some of the differences among programming languages and some JavaScript libraries here [1], because I'm working on getting a better base64 added to JS [2].
[1] https://gist.github.com/bakkot/16cae276209da91b652c2cb3f612a...
[2] https://github.com/tc39/proposal-arraybuffer-base64
Cool! This caught my interest so I wrote a little program to compute that fixpoint up to a specified precision: https://github.com/cls/b64fix