tad
bsv
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tad
-
Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL
Very impressive project and vision! Love the demo!
I am also ex-GS and worked on what I am fairly sure is the table display tool you're describing. I tried to carry the essential aspects of that work (multi-level pivots, with drill-down to the leaf level, and all interactive events and analytics supported by db queries) to Tad (https://www.tadviewer.com/, https://github.com/antonycourtney/tad), another open source project powered by DuckDb.
An embeddable version of Tad, powered by DuckDb WASM, is used as the results viewer in the MotherDuck Web UI (https://app.motherduck.com/).
If you're interested in embedding Tad in Pretzel, or leveraging pieces of it in your work, or collaborating on other aspects of DuckDb WASM powered UIs, please get in touch!
- Building a database to search Excel files
-
Consider Using CSV
Since this is about CSV, this is obligatory tool for larger ones:
* https://github.com/antonycourtney/tad
bsv
-
Do You Know How Much Your Computer Can Do in a Second?
grep is the beginning, not the end. it’s a great performance baseline to meet, and then beat[1]. computers are insanely fast!
the startups using grep on aws are undercutting those doing slower things on aws. this must be why aws architects never talk about grep.
1. https://github.com/nathants/bsv
-
Generic dynamic array in 60 lines of C
awesome! i do the same thing for arrays[1] and maps[2].
stuff like this is great when you are trying to find performance ceiling of some workload. literally nothing to hide.
1. https://github.com/nathants/bsv/blob/master/util/array.h
2. https://github.com/nathants/bsv/blob/master/util/map.h
-
Consider Using CSV
i had a lot of fun exploring the performance ceiling of csv and csv like formats. turns out binary encoding of size prefixed byte arrays is fast[1].
csv is just a sequence of 2d byte arrays. probably avoid if dealing with heterogeneous external data. possibly use if dealing with homogeneous internal data.
https://github.com/nathants/bsv
- Big Data file formats
-
GitHub - SixArm/usv: USV: Unicode Separated Values
i like this idea, and do something similar: https://github.com/nathants/bsv
- Ask HN: Have you created programs for only your personal use?
What are some alternatives?
parquet-go - pure golang library for reading/writing parquet file
rill - Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
ndjson.github.io - Info Website for NDJSON
pretzelai - Open-source, browser-local data exploration using DuckDB-Wasm and PRQL
js-bson - BSON Parser for node and browser
duckdb-wasm - WebAssembly version of DuckDB
KeenWrite - Free, open-source, cross-platform desktop Markdown text editor with live preview, string interpolation, and math.
csv-to-ml - 🧌 Upload a CSV file and get an ML model
xsv - A fast CSV command line toolkit written in Rust.