falcon
parquet-wasm
falcon | parquet-wasm | |
---|---|---|
2 | 6 | |
925 | 471 | |
0.6% | - | |
7.8 | 9.0 | |
17 days ago | 4 days ago | |
Jupyter Notebook | Rust | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
falcon
- Goodbye, Node.js Buffer
-
Launch HN: Drifting in Space (YC W22) – A server process for every user
Good questions!
> Why do you need one process per user? / Wouldn't this "event loop" actually be more efficient that one user/process, as there would be less context switching cost from the OS?
We're particularly interested in apps that are often CPU-bound, so a traditional event-loop would be blocked for long periods of time. A typical solution is to put the work into a thread, so there would still be a context switch, albeit a smaller one.
The process-per-user approach makes the most sense when a significant amount of the data used by each user does not overlap with other users. VS Code (in client/server mode) is a good example of this -- the overhead of siloing each process is relatively low compared to the benefits it gives. We think more data-heavy apps will make the same trade-offs.
> Can I just keep a map of (connection, thread_id) on my server, and spawn one thread per user on my own server?
If you don't have to scale beyond one server, this approach works fine, but it makes scaling horizontally complicated because you suddenly can't just use a plain old load balancer. It's not just about routing requests to the right server; deciding which server to run the threads on becomes complicated because you ideally want to decide based on the server load of each. We started going down this path, realized we'd end up re-inventing Kubernetes, so decided to embrace it instead.
> Could I just load up my server with many cores, and give each user a SQLite database which runs each query in its own thread? This way a multi GB database would not be loaded into RAM, the query would filter it down to a result set.
If, for a particular use case, it's economical to keep the data ready in a database that supports the query pattern users will make, it's probably not a good fit for a session-lived backend. In database terms, where our architecture makes sense is when you need to create an index on a dataset (or subset of a dataset) during the runtime of an application. For example, if you have thousands of large parquet files in blob storage and you want a user to be able to load one and run [Falcon](https://github.com/vega/falcon)-type analysis on it.
parquet-wasm
- FLaNK AI Weekly for 29 April 2024
- Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data
-
Goodbye, Node.js Buffer
nodejs-polars is node-specific and uses native FFI. polars can be compiled to Wasm but doesn't yet have a js API out of the box.
As for the fastest way to serialize data to Pandas data to the browser, you should use Parquet; it's the fastest to write on the Python side and read on the JS side, while also being compressed. See https://github.com/kylebarron/parquet-wasm (full disclosure, I wrote this)
-
Rust 1.63.0
I'm building WebAssembly bindings to existing Rust libraries [0] and lower-dependency geospatial tools [1]. Rust makes it very easy to bind rust code to both WebAssembly and Python. And by avoiding some large C geospatial dependencies we can get reliable performance in both wasm and Python using the exact same codebase.
[0]: https://github.com/kylebarron/parquet-wasm
[1]: https://github.com/kylebarron/geopolars
- Why isn’t there a decent file format for tabular data?
-
Recommendations when publishing a WASM library
Looks to be a great resource. I've been working on a WASM implementation of reading and writing Apache Parquet [0] and it's been difficult being new to WASM to find the best way of distributing the WASM that works on Node and through bundlers like Webpack.
[0]: https://github.com/kylebarron/parquet-wasm
What are some alternatives?
stateroom - A lightweight framework for building WebSocket-based application backends.
datasette-stripe - A web SQL interface to your Stripe account using Datasette.
nodejs-polars - nodejs front-end of polars
quickjs-emscripten - Safely execute untrusted Javascript in your Javascript, and execute synchronous code that uses async functions
streams - Streams Standard
transmitic - Encrypted, peer to peer, file transfer program :: https://discord.gg/tRT3J6T :: https://www.reddit.com/r/transmitic/ :: https://twitter.com/transmitic
proposal-zero-copy-arraybuffer-list - A proposal for zero-copy ArrayBuffer lists
geopolars - Geospatial extensions for Polars
proposal-arraybuffer-base64 - TC39 proposal for Uint8Array<->base64/hex
odiff - The fastest pixel-by-pixel image visual difference tool in the world.
spawner - Session backend orchestrator for ambitious browser-based apps. [Moved to: https://github.com/drifting-in-space/plane]
rson - Rust Object Notation