parquet-wasm
proposal-arraybuffer-base64
parquet-wasm | proposal-arraybuffer-base64 | |
---|---|---|
6 | 5 | |
464 | 215 | |
- | 4.2% | |
9.0 | 7.6 | |
3 days ago | 18 days ago | |
Rust | HTML | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
parquet-wasm
- FLaNK AI Weekly for 29 April 2024
- Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data
-
Goodbye, Node.js Buffer
nodejs-polars is node-specific and uses native FFI. polars can be compiled to Wasm but doesn't yet have a js API out of the box.
As for the fastest way to serialize data to Pandas data to the browser, you should use Parquet; it's the fastest to write on the Python side and read on the JS side, while also being compressed. See https://github.com/kylebarron/parquet-wasm (full disclosure, I wrote this)
-
Rust 1.63.0
I'm building WebAssembly bindings to existing Rust libraries [0] and lower-dependency geospatial tools [1]. Rust makes it very easy to bind rust code to both WebAssembly and Python. And by avoiding some large C geospatial dependencies we can get reliable performance in both wasm and Python using the exact same codebase.
[0]: https://github.com/kylebarron/parquet-wasm
[1]: https://github.com/kylebarron/geopolars
- Why isn’t there a decent file format for tabular data?
-
Recommendations when publishing a WASM library
Looks to be a great resource. I've been working on a WASM implementation of reading and writing Apache Parquet [0] and it's been difficult being new to WASM to find the best way of distributing the WASM that works on Node and through bundlers like Webpack.
[0]: https://github.com/kylebarron/parquet-wasm
proposal-arraybuffer-base64
-
Updates from the 100th TC39 meeting
Uint8Array to/from Base64: Uint8Array<->base64/hex.
-
Goodbye, Node.js Buffer
The proposal for native base64 support for Uint8Arrays is mine. I'm glad to see people are interested in using it. (So am I!)
For a status update, for the last year or two the main blocker has been a conflict between a desire to have streaming support and a desire to keep the API small and simple. That's now resolved [1] by dropping streaming support, assuming I can demonstrate a reasonably efficient streaming implementation on top of the one-shot implementation, which won't be hard unless "reasonably efficient" means "with zero copies", in which case we'll need to keep arguing about it.
I've also been working on documenting [2] the differences between various base64 implementations in other languages and in JS libraries to ensure we have a decent picture of the landscape when designing this.
With luck, I hope to advance the proposal to stage 3 ("ready for implementations") within the next two meetings of TC39 - so either next month or January. Realistically it will probably take a little longer than that, and of course implementations take a while. But it's moving along.
[1] https://github.com/tc39/proposal-arraybuffer-base64/issues/1...
[2] https://gist.github.com/bakkot/16cae276209da91b652c2cb3f612a...
-
Base64 Encoding, Explained
There's some additional interesting details, and a surprising amount of variation in those details, once you start really digging into things.
If the length of your input data isn't exactly a multiple of 3 bytes, then encoding it will use either 2 or 3 base64 characters to encode the final 1 or 2 bytes. Since each base64 character is 6 bits, this means you'll be using either 12 or 18 bits to represent 8 or 16 bytes. Which means you have an extra 4 or 2 bits which don't encode anything.
In the RFC, encoders are required to set those bits to 0, but decoders only "MAY" choose to reject input which does not have those set to 0. In practice, nothing rejects those by default, and as far as I know only Ruby, Rust, and Go allow you to fail on such inputs - Python has a "validate" option, but it doesn't validate those bits.
The other major difference is in handling of whitespace and other non-base64 characters. A surprising number of implementations, including Python, allow arbitrary characters in the input, and silently ignore them. That's a problem if you get the alphabet wrong - for example, in Python `base64.standard_b64decode(base64.urlsafe_b64encode(b'\xFF\xFE\xFD\xFC'))` will silently give you the wrong output, rather than an error. Ouch!
Another fun fact is that Ruby's base64 encoder will put linebreaks every 60 characters, which is a wild choice because no standard encoding requires lines that short except PEM, but PEM requires _exactly_ 64 characters per line.
I have a writeup of some of the differences among programming languages and some JavaScript libraries here [1], because I'm working on getting a better base64 added to JS [2].
[1] https://gist.github.com/bakkot/16cae276209da91b652c2cb3f612a...
[2] https://github.com/tc39/proposal-arraybuffer-base64
-
Updates from the 96th TC39 meeting
Base64 for Uint8Array:ArrayBuffer to/from Base64
-
Updates from the 84th meeting of TC39
ArrayBuffer to/from base64: ArrayBuffer <-> base64 string functions.
What are some alternatives?
datasette-stripe - A web SQL interface to your Stripe account using Datasette.
nodejs-polars - nodejs front-end of polars
quickjs-emscripten - Safely execute untrusted Javascript in your Javascript, and execute synchronous code that uses async functions
proposal-intl-numberformat-v3 - Additional features for Intl.NumberFormat to solve key pain points.
transmitic - Encrypted, peer to peer, file transfer program :: https://discord.gg/tRT3J6T :: https://www.reddit.com/r/transmitic/ :: https://twitter.com/transmitic
proposal-array-from-async - Draft specification for a proposed Array.fromAsync method in JavaScript.
geopolars - Geospatial extensions for Polars
proposal-async-iterator-helpers - Methods for working with async iterators in ECMAScript
odiff - The fastest pixel-by-pixel image visual difference tool in the world.
excel_97_egg - A web port of the magic carpet simulator hidden within Microsoft Excel 97
rson - Rust Object Notation
proposal-regexp-atomic-operators