Friends don't let friends export to CSV

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • usv

    Unicode Separated Values (USV) data markup for units, records, groups, files, streaming, and more.

  • I can't remember the last time I, or anyone I've ever worked with for that matter, ever typed up a CSV from scratch. The whole point of USV is that the delimiters can't normally be typed so you don't have to worry about escaping.

    USV supports displayable delimiters (see https://github.com/SixArm/usv), so for the much more common case of editing an existing CSV in a text editor, you can just copy and paste.

  • Sep

    World's Fastest .NET CSV Parser. Modern, minimal, fast, zero allocation, reading and writing of separated values (`csv`, `tsv` etc.). Cross-platform, trimmable and AOT/NativeAOT compatible. (by nietras)

  • If you ever need to parse CSV really fast and happen to know C#, there is an incredible vectorized parser for that: https://github.com/nietras/Sep/

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • csvy

    Import and Export CSV Data With a YAML Metadata Header

  • There is CSVY, which lets you set a delimiter, schema, column types, etc. and has libraries in many languages and is natively supported in R.

    Also is backwards-compatible with most CSV parsers.

    https://github.com/leeper/csvy

  • Fiona

    Fiona reads and writes geographic data files

  • Your issue is that you're using the default (old) binding to GDAL, based on Fiona [0].

    You need to use pyogrio [1], its vectorized counterpart, instead. Make sure you use `engine="pyogrio"` when calling `to_file` [2]. Fiona does a loop in Python, while pyogrio is exclusively compiled. So pyogrio is usually about 10-15x faster than fiona. Soon, in pyogrio version 0.8, it will be another ~2-4x faster than pyogrio is now [3].

    [0]: https://github.com/Toblerity/Fiona

    [1]: https://github.com/geopandas/pyogrio

    [2]: https://geopandas.org/en/stable/docs/reference/api/geopandas...

    [3]: https://github.com/geopandas/pyogrio/pull/346

  • pyogrio

    Vectorized vector I/O using OGR

  • Your issue is that you're using the default (old) binding to GDAL, based on Fiona [0].

    You need to use pyogrio [1], its vectorized counterpart, instead. Make sure you use `engine="pyogrio"` when calling `to_file` [2]. Fiona does a loop in Python, while pyogrio is exclusively compiled. So pyogrio is usually about 10-15x faster than fiona. Soon, in pyogrio version 0.8, it will be another ~2-4x faster than pyogrio is now [3].

    [0]: https://github.com/Toblerity/Fiona

    [1]: https://github.com/geopandas/pyogrio

    [2]: https://geopandas.org/en/stable/docs/reference/api/geopandas...

    [3]: https://github.com/geopandas/pyogrio/pull/346

  • geoparquet

    Specification for storing geospatial vector data (point, line, polygon) in Parquet

  • That's why I'm working on the GeoParquet spec [0]! It gives you both compression-by-default and super fast reads and writes! So it's usually as small as gzipped CSV, if not smaller, while being faster to read and write than GeoPackage.

    Try using `GeoDataFrame.to_parquet` and `GeoPandas.read_parquet`

    [0]: https://github.com/opengeospatial/geoparquet

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts