Leveraging SIMD: Splitting CSV Files at 3Gb/S

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • CPython

    The Python programming language

  • If you're doing user-supplied CSVs, definitely... but if you are ingesting CSVs from a known source with known format () it can definitely make sense to use a high-speed optimized ingester.

    One might wonder if it might be worth the time to look into optimising the runtimes of various languages.

    - PHP isn't optimized anywhere, but at least it's C: https://github.com/php/php-src/blob/1c0e613cf1a24cdc159861e4...

    - Python's is even worse as it's implemented in native Python instead of C: https://github.com/python/cpython/blob/main/Lib/csv.py

    - Java doesn't have a "standard" way at all (https://www.baeldung.com/java-csv-file-array), and OpenCSV seems the usual object-oriented hell (https://sourceforge.net/p/opencsv/source/ci/master/tree/src/...).

    - Ruby's CSV is native Ruby: https://github.com/ruby/ruby/blob/bd65757f394255ceeb2c958e87...

  • PHPT

    The PHP Interpreter

  • If you're doing user-supplied CSVs, definitely... but if you are ingesting CSVs from a known source with known format () it can definitely make sense to use a high-speed optimized ingester.

    One might wonder if it might be worth the time to look into optimising the runtimes of various languages.

    - PHP isn't optimized anywhere, but at least it's C: https://github.com/php/php-src/blob/1c0e613cf1a24cdc159861e4...

    - Python's is even worse as it's implemented in native Python instead of C: https://github.com/python/cpython/blob/main/Lib/csv.py

    - Java doesn't have a "standard" way at all (https://www.baeldung.com/java-csv-file-array), and OpenCSV seems the usual object-oriented hell (https://sourceforge.net/p/opencsv/source/ci/master/tree/src/...).

    - Ruby's CSV is native Ruby: https://github.com/ruby/ruby/blob/bd65757f394255ceeb2c958e87...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ruby

    The Ruby Programming Language

  • If you're doing user-supplied CSVs, definitely... but if you are ingesting CSVs from a known source with known format () it can definitely make sense to use a high-speed optimized ingester.

    One might wonder if it might be worth the time to look into optimising the runtimes of various languages.

    - PHP isn't optimized anywhere, but at least it's C: https://github.com/php/php-src/blob/1c0e613cf1a24cdc159861e4...

    - Python's is even worse as it's implemented in native Python instead of C: https://github.com/python/cpython/blob/main/Lib/csv.py

    - Java doesn't have a "standard" way at all (https://www.baeldung.com/java-csv-file-array), and OpenCSV seems the usual object-oriented hell (https://sourceforge.net/p/opencsv/source/ci/master/tree/src/...).

    - Ruby's CSV is native Ruby: https://github.com/ruby/ruby/blob/bd65757f394255ceeb2c958e87...

  • Text-CSV_XS

    perl5 module for composition and decomposition of comma-separated values

  • Perl's best known library Terxt::CSV has both a pure-perl and a C implementation.

    Here is the C version

    https://github.com/Tux/Text-CSV_XS/blob/master/CSV_XS.xs

  • zsv

    zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts