Consider Using CSV

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • tad

    A desktop application for viewing and analyzing tabular data

  • Since this is about CSV, this is obligatory tool for larger ones:

    * https://github.com/antonycourtney/tad

  • KeenWrite

    Discontinued Free, open-source, cross-platform desktop Markdown text editor with live preview, string interpolation, and math.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • parquet-go

    pure golang library for reading/writing parquet file

  • > It's so complex to work with, that unless you're specifically in data science, it's both unheard of and unusable.

    FWIW, in my experience at a "data analytics platform" company, it's reasonably popular for data-heavy workflows since Because Parquet is well-defined, and file sizes are a fraction of their CSV equivalents.

    > Is it a limitation of the format itself?

    I don't think so. In other languages, you can generally read/write Parquet files without a ton of dependencies (e.g. https://github.com/xitongsys/parquet-go).

  • js-bson

    BSON Parser for node and browser

  • ndjson.github.io

    Info Website for NDJSON

  • No one uses that format for streamed json, see ndson and jsonl

    http://ndjson.org/

    The size complaint is overblown, as repeated fields are compressed away.

    As other folks rightfully commented, csv is a mine field. One should assume every CSV file is broken in some way. They also don't enumerate any of the downsides of CSV.

    What people should consider is using formats like Avro or Parquet that carry their schema with them so the data can be loaded and analyzed without have to manually deal with column meaning.

  • xsv

    A fast CSV command line toolkit written in Rust.

  • For manipulating CSV from the terminal, check out https://github.com/BurntSushi/xsv

  • bsv

    maximum performance data processing (by nathants)

  • i had a lot of fun exploring the performance ceiling of csv and csv like formats. turns out binary encoding of size prefixed byte arrays is fast[1].

    csv is just a sequence of 2d byte arrays. probably avoid if dealing with heterogeneous external data. possibly use if dealing with homogeneous internal data.

    https://github.com/nathants/bsv

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts