Our great sponsors
A desktop application for viewing and analyzing tabular data
Since this is about CSV, this is obligatory tool for larger ones:
Free, open-source, cross-platform desktop Markdown text editor with live preview, string interpolation, and math.
Appwrite - The Open Source Firebase alternative introduces iOS support . Appwrite is an open source backend server that helps you build native iOS applications much faster with realtime APIs for authentication, databases, files storage, cloud functions and much more!
pure golang library for reading/writing parquet file
> It's so complex to work with, that unless you're specifically in data science, it's both unheard of and unusable.
FWIW, in my experience at a "data analytics platform" company, it's reasonably popular for data-heavy workflows since Because Parquet is well-defined, and file sizes are a fraction of their CSV equivalents.
> Is it a limitation of the format itself?
I don't think so. In other languages, you can generally read/write Parquet files without a ton of dependencies (e.g. https://github.com/xitongsys/parquet-go).
BSON Parser for node and browser
Info Website for NDJSON
No one uses that format for streamed json, see ndson and jsonl
The size complaint is overblown, as repeated fields are compressed away.
As other folks rightfully commented, csv is a mine field. One should assume every CSV file is broken in some way. They also don't enumerate any of the downsides of CSV.
What people should consider is using formats like Avro or Parquet that carry their schema with them so the data can be loaded and analyzed without have to manually deal with column meaning.
A fast CSV command line toolkit written in Rust.
For manipulating CSV from the terminal, check out https://github.com/BurntSushi/xsv
maximum performance data processing (by nathants)
i had a lot of fun exploring the performance ceiling of csv and csv like formats. turns out binary encoding of size prefixed byte arrays is fast.
csv is just a sequence of 2d byte arrays. probably avoid if dealing with heterogeneous external data. possibly use if dealing with homogeneous internal data.
Icsp – Command-line iCalendar (.ics) to CSV parser
3 projects | news.ycombinator.com | 24 Mar 2023
What monitoring tool do you use or recommend?
5 projects | reddit.com/r/selfhosted | 6 Mar 2023
Anyone else feel like they are using Pandas as a crutch?
1 project | reddit.com/r/dataengineering | 5 Mar 2023
1 project | reddit.com/r/ITProTuesday | 3 Mar 2023
2 projects | reddit.com/r/programming | 16 Jan 2023