scroll
zsv
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scroll
- [OC] Cancer in the United States: Heatmap Visualizations
-
Ask HN: What are you building that is taking multiple years to make usable?
It took me many years to get Scroll (https://scroll.pub/) to the point where I love it and am confident it will be the dominant language for writing going forward (replacing markdown).
I first had to invent Tree Notation (2017), which I got wrong on my first two tries (2012's Note and 2013's Space). Then I needed to invent Grammar (2017), and then I made the predecessor to Scroll called Dumbdown (2019). 2 years after that I shipped the first version of Scroll (2021).
Now we are on Scroll version 58 and it's blazing fast, very simple, extremely extendible, and scales very well.
It was 90% me for a while, but recently been very much a team effort.
It took a while to get right because it's a whole new kind of language, so there were a lot of mistakes that I made and had to undo, and it took a while to figure out exactly what was special about it and how to double down on that.
- Ask HN: With recent layoffs, how would you advise new grads entering the market?
-
Anyone interested in starting a local newspaper using new tech?
I recently started 2 new newspapers: https://longbeach.pub/ and http://hawaii.pub/. Very different from traditional newspapers in that they are: public domain, open source (view source on every page), and built using a new language (https://scroll.pub/).
-
Argdown: A simple syntax for complex argumentation
Another cool site I found recently (via the replit guy) is https://www.rootclaim.com/
Very cool way to present arguments.
I'm thinking of taking that, as well as argdown, and building some easy to use keywords in scroll https://scroll.pub/
-
We Need to Know LR and Recursive Descent Parsing Techniques
> Context-free grammars, and their associated parsing techniques, don't align well with real-world compilers, and thus we should deemphasise CFGs (Context-Free Grammars) and their associated parsing algorithms.
I think CFG are highly overrated. Top down recursive descent parsers are simple and allow you to craft more human languages. I think building top down parsers is something every dev should do. It's a simple technique with tremendous power.
I think the source code for Scroll (https://github.com/breck7/scroll/tree/main/grammar) demonstrates how liberating moving away from CFGs can be. Easy to extend, compose, build new backends, debug, et cetera. Parser, compiler, and interpreter for each node all in one place. Swap nodes around between languages. Great evolutionary characteristics.
I'll stop there (realizing I need to improve the docs and write a blog post).
-
I am building a new kind of newspaper and so have been collecting and studying old newspapers. Here is one from my collection, an issue of the Columbian Centinel (Boston), from 1795, when George Washington was president. The classifieds make me laugh. Lots of Schooners for sale.
- Uses a new language called Scroll: https://scroll.pub/
-
Start a Fucking Blog
Also, put down Markdown and give our Scroll a try: https://scroll.pub
It now powers sites like my own blog (https://breckyunits.com/), knowledge bases like PLDB.com, and our first new public domain daily newspaper called the Long Beach Pub (https://longbeach.pub/1-3-2023.html).
-
Programming languages in 25 days, Part 2: Reflections on language design
> Java, Go, Javascript, Rust, etc are all regularly written with whitespace, and have tools to enforce such formatting, but they don't derive information from it.
Ah you reminded me. A curious phenomenon I've observed with Prettier in JS and fmt in Go is languages are moving to standardized whitespace, but as you said, not yet deriving information from it. I don't know enough about Java or Rust but I suspect they probably both have adopted a Prettier/fmt like convention where all code is formatted on save. So it seems like we are moving to a world where it will be a simple flip of a switch to then start having popular languages extract meaning from the whitespace.
> Also, Python has existed for decades and still there is little further adoption of indentation-sensitivity. It doesn't seem like a wave of indentation-sensitive languages will be coming any time soon.
I think it's coming big time this year. I think our Scroll (https://scroll.pub/) will catch fire and be the go to language instead of Markdown by the end of the year. Then with the increasing success of TreeBase (powering PLDB and others) we will start to see JSON fall for config formats and document storage databases. A lot more will happen to, data vis will be a big one, but those 2 I'm reasonably certain of happening in 2023.
-
Ask HN: Programs that saved you 100 hours? (2022 edition)
GoAccess: https://goaccess.io/. I don't miss Google Analytics at all.
Loom. It's not open source I don't think but I'm digging it and excited when a public domain competitor comes out.
Our https://scroll.pub/. It's far beyond markdown at this point. I am able to not only write better but also maintain thousands of pages of content by hand (well, most of the credit for that belongs to Apple M1s, Sublime Text, git, MacOS, and Github). The stuff we are doing with it now would just not be possible with anything else, and what we're coming out with next year is super exciting. It's all public domain.
zsv
-
Analyzing multi-gigabyte JSON files locally
If it could be tabular in nature, maybe convert to sqlite3 so you can make use of indexing, or CSV to make use of high-performance tools like xsv or zsv (the latter of which I'm an author).
https://github.com/BurntSushi/xsv
https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql...
-
Show HN: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19
Parsing CSV doesn't have to be slow if you use something like xsv or zsv (https://github.com/liquidaty/zsv) (disclaimer: I'm an author). The speed of CSV parsers is fast enough that unless you are doing something ultra-trivial such as "count rows", your bottleneck will be elsewhere.
The benefits of CSV are:
- human readable
- does not need to be typed (sometimes, data in the raw such as date-formatted data is not amenable to typing without introducing a pre-processing layer that gets you further from the original data)
- accessible to anyone: you don't need to be a data person to dbl-click and open in Excel or similar
The main drawback is that if your data is already typed, CSV does not communicate what the type is. You can alleviate this through various approaches such as is described at https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql..., though I wouldn't disagree that if you can be assured that your starting data conforms to non-text data types, there are probably better formats than CSV.
The main benefit of Arrow, IMHO, is less as a format for transmitting / communicating but rather as a format for data at rest, that would benefit from having higher performance column-based read and compression
- Yq is a portable yq: command-line YAML, JSON, XML, CSV and properties processor
-
csvkit: Command-line tools for working with CSV
I wanted so much to use csvkit and all the features it had, but its horrendous performance made it unscalable and therefore the more I used it, the more technical debt I accumulated.
This was one of the reasons I wrote zsv (https://github.com/liquidaty/zsv). Maybe csvkit could incorporate the zsv engine and we could get the best of both worlds?
Examples (using majestic million csv):
---
- Ask HN: Programs that saved you 100 hours? (2022 edition)
-
Show HN: Split CSV into multiple files to avoid the Excel's 1M row limitation
}
```
This of course assumes that each line is a single record, so you'll need some preprocessing if your CSV might contain embedded line-ends. For the preprocessing, you can use something like the `2tsv` command of https://github.com/liquidaty/zsv (disclaimer: I'm its author), which converts CSV to TSV and replaces newline with \n.
You can also use something like `xsv split` (see https://lib.rs/crates/xsv) which frankly is probably your best option as of today (though zsv will be getting its own shard command soon)
- Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet
-
Ask HN: Best way to find help creating technical doc (open- or closed-source)?
Am looking for one-time help creating documentation (e.g. man pages, tutorials) for open source project (e.g. https://github.com/liquidaty/zsv) as well as product documentation for commercial products, but not enough need for a full-time job. Requires familiarity with, for lack of better term, data janitorial work, and preferably with methods of auto-generating documentation. Any suggestions as to forums or other ways to find folks who might fit the bill for ad-hoc or part-time work of this nature?
-
Q – Run SQL Directly on CSV or TSV Files
Nice work. I am a fan of tools like this and look forward to giving this a try.
However, in my first attempted query (version 3.1.6 on MacOS), I ran into significant performance limitations and more importantly, it did not give correct output.
In particular, running on a narrow table with 1mm rows (the same one used in the xsv examples) using the command "select country, count() from worldcitiespop_mil.csv group by country" takes 12 seconds just to get an incorrect error 'no such column: country'.
using sqlite3, it takes two seconds or so to load, and less than a second to run, and gives me the correct result.
Using https://github.com/liquidaty/zsv (disclaimer, I'm one of its authors), I get the correct results in 0.95 seconds with the one-liner `zsv sql 'select country, count() from data group by country' worldcitiespop_mil.csv`.
I look forward to trying it again sometime soon
-
A Trillion Prices
All this banter arguing over CSV, JSON, sqlite seems unnecessary when you can just push format X through a pipe and get whichever format Y you want back out: https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql...
(disclaimer: I'm one of the zsv authors)
What are some alternatives?
breckyunits.com - Breck Yunits' Blog
visidata - A terminal spreadsheet multitool for discovering and arranging data
Zato - ESB, SOA, REST, APIs and Cloud Integrations in Python
duckdb - DuckDB is an in-process SQL OLAP Database Management System
CameraTraps - PyTorch Wildlife: a Collaborative Deep Learning Framework for Conservation.
lnav - Log file navigator
djot - A light markup language
tsv-utils - eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
sumatrapdf - SumatraPDF reader
ClickHouse - ClickHouse® is a free analytics DBMS for big data
ppg.report - Weather report tailored for paramotor pilots, available worldwide. 🌏 Combines winds aloft, nearby Terminal Aerodrome Forecasts, hourly forecast, NWS active alerts, FAA TFRs, SIGMETs, G-AIRMETs and CWAs
nio - Low Overhead Numerical/Native IO library & tools