It's Time to Retire the CSV

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • cyanide

    BSON documents in Elixir language

  • > I'm saying that when you decode an Avro document, the result that comes out (presuming you don't tell the Avro decoder anything special about custom types your runtime supports and how it should map them) is a JSON document.

    Semantic point: it's not a "document".

    There are tools which will decode Avro and output the data in JSON (typically using the JSON encoding of Avro: https://avro.apache.org/docs/current/spec.html#json_encoding), but the ADT that is created is by no means a JSON document. The ADT that is created has more complex semantics than JSON; JSON is not the canonical representation.

    > By which I don't mean JSON-encoded text, but rather an in-memory ADT that has the exact set of types that exist in JSON, no more and no less.

    Except Avro has data types that are not the exact set of types that exist in JSON. The first clue on this might be that the Avro spec includes mappings that list how primitive Avro types are mapped to JSON types.

    > Or, to put that another way, Avro is a way to encode JSON-typed data, just as "JSON text", or https://bsonspec.org/, is a way to encode JSON-typed data

    BSON, by design, was meant to be a more efficient way to encode JSON data, so yes, it is a way to encode JSON-typed data. Avro, however, was not defined as a way to encode JSON data. It was defined as a way to encode data (with a degree of specialization for the case of Hadoop sequence files, where you are generally storing a large number of small records in one file).

    A simple counter example: Avro has a "float" type, which is a 32-bit IEEE 754 floating point number. Neither JSON nor BSON have that type.

    Technically, JSON doesn't really have types, it has values, but even if you pretend that JavaScript's types are JSON's types, there's nothing "canonical" about JavaScript's types for Avro.

    Yes, you can represent JSON data in Avro, and Avro in JSON, much as you can represent data in two different serialization formats. Avro's data model is very much defined independently of JSON's data model (as you'd expect).

  • csv2sqlite

  • CSV files are terrible, but I love them. I love sites that offer an "Export to CSV" option, because I know I can take that export and start working with it immediately. I can give that CSV file to my Dad, who can open it in Excel, or I can run a single command[0] to import it into a sqlite database.

    It is a lowest common denominator format. That type of thing is incredibly hard to kill unless you can replace it with something that is simpler. Good luck with that.

    [0]: https://github.com/psanford/csv2sqlite

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ndjson.github.io

    Info Website for NDJSON

  • Indeed, this has already been done: http://ndjson.org/

    To be fair it's not an objectionable format. Using line breaks to separate objects makes it streamable, and you don't need to enclose the whole thing in an array to make it a valid JSON document.

  • datasette

    An open source multi-tool for exploring and publishing data

  • One of my goals with https://datasette.io is to offer a better alternative for publishing data than sharing a link to a CSV file.

    The trick is that if you compile data into a SQLite file and then deploy the Datasette web application with a bundled copy of that database file, users who need CSV can still have it: every Datasette table and query offers a CSV export.

    But... you can also get the data out as JSON. Or you can reshape it (rename columns etc) with a SQL query and export the new shape.

    Or you can install plugins like https://datasette.io/plugins/datasette-yaml or https://datasette.io/plugins/datasette-ics or https://datasette.io/plugins/datasette-atom to enable other formats.

  • json

    JSON for Modern C++

  • csvz

    The hot new standard in open databases

  • naya

    A fast streaming JSON parser written in Python

  • Yes, Python's bundled json module does not support that style of parsing. I saw several suggestions in a quick search to try NAYA [1] if you need that in Python (it's a no dependency Python 3 library) or one of the C-backed wrapper libraries NAYA mentions at the bottom of its Readme if you can afford native dependencies.

    [1] https://github.com/danielyule/naya

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts