How fast can you parse a CSV file in C#?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • rust

    Empowering everyone to build reliable and efficient software.

    Worked on a relevant project for Meta.

    There’s a lot of overhead as soon as you involve a filesystem rather than a block device, even on a dedicated disk, particularly with btrfs. I don’t know if the same is true with MacOS and APFS; this isn’t the area I usually work in. However copy-on-write file systems (which I believe apfs is) are somewhat predisposed to fragment files as part of the dedup process; I don’t know if apfs runs it online in some way so it could have affected the article’s author’s results.

    The standard library implementation details can also have a huge impact, eg I observed with Rust for a prior project when I started fiddling with the read buffer size:

    https://github.com/rust-lang/rust/issues/49921

    The other issue that I see is that their I/O is implicitly synchronous and requires a memory copy. They might see better performance if they can memmap the file, which can probably solve both issues. Then if C# allows it, they can just parse the CSV in-place; with a language like Rust, you can even trivially do this in a zero-copy manner, though I suspect it’s more involved with C# since this requires setting up strings / parsing that point at the memmaped file.

    At that point, the OS should be theoretically able to serve up the cached file for the application to do some logic with, without ever needing to copy the full contents again into separate strings.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • 1brc

    C# has an abstraction for memory-mapped files. You can also always use raw pointers and directly call the corresponding OS APIs with interop too.

    However, the fastest-performing implementations in 1BRC challenge that were written in C# ended up with inconclusive results whether using memory-mapping over RandomAccess.Read API (which is basically a thin wrapper over read/pread calls) are faster or not: https://github.com/noahfalk/1brc/?tab=readme-ov-file#file-re...

    You can relatively easily do 2 GiB/s reads with RandomAccess/FileStream as long as sufficiently large buffer size is used. FileStream default settings already provide a quite good performance, and make it use adaptive buffer size under the hood. Memory-mapping is convenient but it's not a silver bullet and page-faulting then mapping the page and filling it with data by performing the read within kernel space is not necessarily cheaper than passing a pointer to a buffer to read into.

    The challenges in Rust and C# are going to be very similar in this type of task since C# can just pin the GC-allocated arrays to read into, call into malloc or 'stackalloc' the temporary buffer inline, and the rest of implementation will be subject to more or less identical constraints.

  • simdcsv

    A fast SIMD parser for CSV files

    Lemire himself also worked on a csv parser with Geoff Langdale in C++ that uses SIMD to accelerate the parsing. [1]

    [1] https://github.com/geofflangdale/simdcsv

  • zsv

    zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

    Haven't yet seen any of these beat https://github.com/liquidaty/zsv when real-world constraints are applied (e.g. we no longer assume that line ends are always \n, or that there are no dbl-quote chars, embedded commas/newlines/dbl-quotes). And maybe under the artificial conditions as well.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Gigabyte Leak Reveals Zen 3 Threadripper CPUs Codenamed 'Chagall'

    2 projects | /r/hardware | 17 Aug 2021
  • Installing Rust on macOS with Homebrew

    2 projects | dev.to | 17 Nov 2024
  • Rust's Sneaky Deadlock With `if let` Blocks

    5 projects | news.ycombinator.com | 12 Nov 2024
  • Perhaps Rust Needs "Defer"

    2 projects | news.ycombinator.com | 6 Nov 2024
  • Starting to Rust: A Developer’s Journey into the Rust Language

    2 projects | dev.to | 4 Nov 2024

Did you konow that Rust is
the 5th most popular programming language
based on number of metions?