arquero
proposal-arraybuffer-base64
arquero | proposal-arraybuffer-base64 | |
---|---|---|
8 | 5 | |
1,191 | 215 | |
1.5% | 4.2% | |
4.6 | 7.6 | |
about 1 month ago | 18 days ago | |
JavaScript | HTML | |
BSD 3-clause "New" or "Revised" License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
arquero
-
Show HN: Matrices – explore, visualize, and share large datasets
Hey HN, I'm excited to share a new side project I've been working on.
The product is called Matrices. You can check it out here: https://matrices.com/.
With Matrices, you can *explore*, *visualize*, and *share* large (100k rows) datasets–all without code. Filter data down to just what you want, visualize it with built-in charts, and share your results with one click.
You can use it today (no login or waitlist or anything). Just copy and paste your data from a google sheet or CSV file.
It's hard to describe the feeling of "gliding over data" you get with Matrices, so I'd rather *show* you how it works instead. This 75s video will give you a sense of how it works: https://www.youtube.com/watch?v=Rrh9_I3Ux8E.
Data is stored locally in your browser until you publish it, though small sample does go to the OpenAI APIs for AI-assisted features.
I started building Matrices because I wanted a tool that made it easy to explore new datasets. When I'm first trying to dig into data, I'll have one question... that leads to another... that will invariably lead to five more questions. It's sort of a fractal process, and I couldn't find many good options that were fast, responsive, and visual.
I figured this crowd would be interested in tech stack as well, it's using arquero [1] bindings over apache arrow for in-memory analytics, and visx [2] for visualizations. I'd like to add duckdb-wasm support at some point to open up a wider set of databases. Data is serialized as parquet to save a bit on bandwidth + storage.
Give it a spin, and let me know what you think. This is my first 'serious frontend project' so I appreciate any and all feedback and bug reports. Feel free to comment here (I'll be around most of the day), or shoot me a note: [email protected]
[1]: https://uwdata.github.io/arquero/
-
Goodbye, Node.js Buffer
https://github.com/uwdata/arquero
- Arquero is a JavaScript library for query processing and transformation of array-backed data tables
- Arquero – data tables wrangling in JavaScript
-
Hal9: Data Science with JavaScript
Transformations: We found out that JavaScript in combination with D3.js has a pretty decent set of data transformation functions; however, it comes nowhere near to Pandas or dplyr. We found out about Tidy.js quite early, loved it, and adopted it. The combination of Tidy.js and D3.js and Plot.js is absolutely amazing for visualizations and data wrangling with small datasets, say 10-100K rows. We were very happy with this for a while; however, once you move away from visualizations into real-world data analysis, we found out 100K rows restrictive, which gets worse when having 100 or 1K columns. So we switched gears and started using Arquero.js, which happens to be columnar and enabled us to process +1M rows in the browser, descent size for real-world data analysis.
- Arquero – Query processing and transformation of array-backed data tables
-
Apache Arrow 3.0.0 Release
Take a look at the arquero library from a research group at University of Washington (the same group that D3 came out of). https://github.com/uwdata/arquero
proposal-arraybuffer-base64
-
Updates from the 100th TC39 meeting
Uint8Array to/from Base64: Uint8Array<->base64/hex.
-
Goodbye, Node.js Buffer
The proposal for native base64 support for Uint8Arrays is mine. I'm glad to see people are interested in using it. (So am I!)
For a status update, for the last year or two the main blocker has been a conflict between a desire to have streaming support and a desire to keep the API small and simple. That's now resolved [1] by dropping streaming support, assuming I can demonstrate a reasonably efficient streaming implementation on top of the one-shot implementation, which won't be hard unless "reasonably efficient" means "with zero copies", in which case we'll need to keep arguing about it.
I've also been working on documenting [2] the differences between various base64 implementations in other languages and in JS libraries to ensure we have a decent picture of the landscape when designing this.
With luck, I hope to advance the proposal to stage 3 ("ready for implementations") within the next two meetings of TC39 - so either next month or January. Realistically it will probably take a little longer than that, and of course implementations take a while. But it's moving along.
[1] https://github.com/tc39/proposal-arraybuffer-base64/issues/1...
[2] https://gist.github.com/bakkot/16cae276209da91b652c2cb3f612a...
-
Base64 Encoding, Explained
There's some additional interesting details, and a surprising amount of variation in those details, once you start really digging into things.
If the length of your input data isn't exactly a multiple of 3 bytes, then encoding it will use either 2 or 3 base64 characters to encode the final 1 or 2 bytes. Since each base64 character is 6 bits, this means you'll be using either 12 or 18 bits to represent 8 or 16 bytes. Which means you have an extra 4 or 2 bits which don't encode anything.
In the RFC, encoders are required to set those bits to 0, but decoders only "MAY" choose to reject input which does not have those set to 0. In practice, nothing rejects those by default, and as far as I know only Ruby, Rust, and Go allow you to fail on such inputs - Python has a "validate" option, but it doesn't validate those bits.
The other major difference is in handling of whitespace and other non-base64 characters. A surprising number of implementations, including Python, allow arbitrary characters in the input, and silently ignore them. That's a problem if you get the alphabet wrong - for example, in Python `base64.standard_b64decode(base64.urlsafe_b64encode(b'\xFF\xFE\xFD\xFC'))` will silently give you the wrong output, rather than an error. Ouch!
Another fun fact is that Ruby's base64 encoder will put linebreaks every 60 characters, which is a wild choice because no standard encoding requires lines that short except PEM, but PEM requires _exactly_ 64 characters per line.
I have a writeup of some of the differences among programming languages and some JavaScript libraries here [1], because I'm working on getting a better base64 added to JS [2].
[1] https://gist.github.com/bakkot/16cae276209da91b652c2cb3f612a...
[2] https://github.com/tc39/proposal-arraybuffer-base64
-
Updates from the 96th TC39 meeting
Base64 for Uint8Array:ArrayBuffer to/from Base64
-
Updates from the 84th meeting of TC39
ArrayBuffer to/from base64: ArrayBuffer <-> base64 string functions.
What are some alternatives?
perspective - A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
nodejs-polars - nodejs front-end of polars
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
proposal-intl-numberformat-v3 - Additional features for Intl.NumberFormat to solve key pain points.
hal9ai - Hal9 — Data apps powered by code and LLMs [Moved to: https://github.com/hal9ai/hal9]
proposal-array-from-async - Draft specification for a proposed Array.fromAsync method in JavaScript.
regression-js - Curve Fitting in JavaScript.
proposal-async-iterator-helpers - Methods for working with async iterators in ECMAScript
arrow-julia - Official Julia implementation of Apache Arrow
excel_97_egg - A web port of the magic carpet simulator hidden within Microsoft Excel 97
cylon - Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
proposal-regexp-atomic-operators