arquero
proposal-zero-copy-arraybuffer-list
arquero | proposal-zero-copy-arraybuffer-list | |
---|---|---|
8 | 2 | |
1,191 | 2 | |
1.5% | - | |
4.6 | 6.0 | |
about 1 month ago | 7 months ago | |
JavaScript | ||
BSD 3-clause "New" or "Revised" License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
arquero
-
Show HN: Matrices – explore, visualize, and share large datasets
Hey HN, I'm excited to share a new side project I've been working on.
The product is called Matrices. You can check it out here: https://matrices.com/.
With Matrices, you can *explore*, *visualize*, and *share* large (100k rows) datasets–all without code. Filter data down to just what you want, visualize it with built-in charts, and share your results with one click.
You can use it today (no login or waitlist or anything). Just copy and paste your data from a google sheet or CSV file.
It's hard to describe the feeling of "gliding over data" you get with Matrices, so I'd rather *show* you how it works instead. This 75s video will give you a sense of how it works: https://www.youtube.com/watch?v=Rrh9_I3Ux8E.
Data is stored locally in your browser until you publish it, though small sample does go to the OpenAI APIs for AI-assisted features.
I started building Matrices because I wanted a tool that made it easy to explore new datasets. When I'm first trying to dig into data, I'll have one question... that leads to another... that will invariably lead to five more questions. It's sort of a fractal process, and I couldn't find many good options that were fast, responsive, and visual.
I figured this crowd would be interested in tech stack as well, it's using arquero [1] bindings over apache arrow for in-memory analytics, and visx [2] for visualizations. I'd like to add duckdb-wasm support at some point to open up a wider set of databases. Data is serialized as parquet to save a bit on bandwidth + storage.
Give it a spin, and let me know what you think. This is my first 'serious frontend project' so I appreciate any and all feedback and bug reports. Feel free to comment here (I'll be around most of the day), or shoot me a note: [email protected]
[1]: https://uwdata.github.io/arquero/
-
Goodbye, Node.js Buffer
https://github.com/uwdata/arquero
- Arquero is a JavaScript library for query processing and transformation of array-backed data tables
- Arquero – data tables wrangling in JavaScript
-
Hal9: Data Science with JavaScript
Transformations: We found out that JavaScript in combination with D3.js has a pretty decent set of data transformation functions; however, it comes nowhere near to Pandas or dplyr. We found out about Tidy.js quite early, loved it, and adopted it. The combination of Tidy.js and D3.js and Plot.js is absolutely amazing for visualizations and data wrangling with small datasets, say 10-100K rows. We were very happy with this for a while; however, once you move away from visualizations into real-world data analysis, we found out 100K rows restrictive, which gets worse when having 100 or 1K columns. So we switched gears and started using Arquero.js, which happens to be columnar and enabled us to process +1M rows in the browser, descent size for real-world data analysis.
- Arquero – Query processing and transformation of array-backed data tables
-
Apache Arrow 3.0.0 Release
Take a look at the arquero library from a research group at University of Washington (the same group that D3 came out of). https://github.com/uwdata/arquero
proposal-zero-copy-arraybuffer-list
-
Goodbye, Node.js Buffer
Yeah, in your case I think most of the complexity is actually on the ReadableStream side, not the base64 side.
The thing that I'd actually want for your case is either a TransformStream for byte stream <-> base64 stream (which I expect will come eventually, once the simple case gets done), or something which would let you read the entire stream into Uint8Array or ArrayBuffer, which is a long-standing suggestion [1].
---
> Why does de-chunking a byte array need to be complicated
Keep in mind the concat proposal is _very_ early. If you think it would be useful to be able to concat Uint8Arrays and have that implicitly concatenate the underlying buffers, [2] is the place to open an issue.
---
> You have made me realize I don't even know what the right venue is to vote on stuff. How should I signal to TC39 that e.g. Array.fromAsync is a good idea?
Unfortunately, it's different places for different things. Streams are not TC39 at all; the right place for suggestions there is in the WHATWG streams repo [3]. Usually there's already an existing issue and you can add your use case as a comment in the relevant issue. TC39 proposals all have their own Github repositories, and you can open a new issue with your use case.
Concrete use cases are much more helpful than just "this is a good idea". Though `fromAsync` in particular everyone agrees is good, and it mostly just needs implementations, which are ongoing; see e.g. [4]. If you _really_ want to advance a stage 3 proposal, you can contribute a PR to Chrome or Firefox with an implementation - but for nontrivial proposals that's usually hard. For TC39 in particular, use cases are only really valuable pre-stage-3 proposals.
[1] https://github.com/whatwg/streams/issues/1019
[2] https://github.com/jasnell/proposal-zero-copy-arraybuffer-li...
[3] https://github.com/whatwg/streams
[4] https://bugs.chromium.org/p/v8/issues/detail?id=13321
What are some alternatives?
perspective - A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
WebGL-Fluid-Simulation - Play with fluids in your browser (works even on mobile)
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
proposal-arraybuffer-base64 - TC39 proposal for Uint8Array<->base64/hex
hal9ai - Hal9 — Data apps powered by code and LLMs [Moved to: https://github.com/hal9ai/hal9]
falcon - Brushing and linking for big data
regression-js - Curve Fitting in JavaScript.
nodejs-polars - nodejs front-end of polars
arrow-julia - Official Julia implementation of Apache Arrow
proposal-async-iterator-helpers - Methods for working with async iterators in ECMAScript
cylon - Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
streams - Streams Standard