Arena-Based Parsers

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • 1brc

    1BRC in .NET among fastest on Linux (by buybackoff)

  • It may seem unexpected given all the hype around Go, but it is a surprisingly poor choice for this. If you want a more convenient language than C++ or Rust but retain the ability to reach optimal performance, C# could serve you much better.

    Go underperforms at trivial XML parsing: https://news.ycombinator.com/item?id=40283721

    If you push it, C# can extract optimal HW utilization when parsing text, beating C++: https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-... (Go was not on the list because it was that much slower)

  • fastxml

    Discontinued Golang fast XML parser

  • A quick google finds https://github.com/ffenix113/fastxml which seems to be doing the 'tips and tricks' of arena parsing and things. Any idea how fast it compares when you get away from memory allocations and things and end up just seeing how the compiler does basic byte manipulation?

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • go

    The Go programming language

  • The description indicates it is not production ready, and is archived at the same time.

    If you pull all stops in each respective language, C# will always end up winning at parsing text as it offers C structs, pointers, zero-cost interop, Rust-style struct generics, cross-platform SIMD API and simply has better compiler. You can win back some performance in Go by writing hot parts in Go's ASM dialect at much greater effort for a specific platform.

    For example, Go has to resort to this https://github.com/golang/go/blob/4ed358b57efdad9ed710be7f4f... in order to efficiently scan memory, while in C# you write the following once and it compiles to all supported ISAs with their respective SIMD instructions for a given vector width: https://github.com/dotnet/runtime/blob/56e67a7aacb8a644cc6b8... (there is a lot of code because C# covers much wider range of scenarios and does not accept sacrificing performance in odd lengths and edge cases, which Go does).

    Another example is computing CRC32: you have to write ASM for Go https://github.com/golang/go/blob/4ed358b57efdad9ed710be7f4f..., in C# you simply write standard vectorized routine once https://github.com/dotnet/runtime/blob/56e67a7aacb8a644cc6b8... (its codegen is competitive with hand-intrinsified C++ code).

    There is a lot more of this. Performance and low-level primitives to achieve it have been an area of focus of .NET for a long time, so it is disheartening to see one tenth of effort in Go to receive so much spotlight.

  • .NET Runtime

    .NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

  • The description indicates it is not production ready, and is archived at the same time.

    If you pull all stops in each respective language, C# will always end up winning at parsing text as it offers C structs, pointers, zero-cost interop, Rust-style struct generics, cross-platform SIMD API and simply has better compiler. You can win back some performance in Go by writing hot parts in Go's ASM dialect at much greater effort for a specific platform.

    For example, Go has to resort to this https://github.com/golang/go/blob/4ed358b57efdad9ed710be7f4f... in order to efficiently scan memory, while in C# you write the following once and it compiles to all supported ISAs with their respective SIMD instructions for a given vector width: https://github.com/dotnet/runtime/blob/56e67a7aacb8a644cc6b8... (there is a lot of code because C# covers much wider range of scenarios and does not accept sacrificing performance in odd lengths and edge cases, which Go does).

    Another example is computing CRC32: you have to write ASM for Go https://github.com/golang/go/blob/4ed358b57efdad9ed710be7f4f..., in C# you simply write standard vectorized routine once https://github.com/dotnet/runtime/blob/56e67a7aacb8a644cc6b8... (its codegen is competitive with hand-intrinsified C++ code).

    There is a lot more of this. Performance and low-level primitives to achieve it have been an area of focus of .NET for a long time, so it is disheartening to see one tenth of effort in Go to receive so much spotlight.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Go, Containers, and the Linux Scheduler

    4 projects | news.ycombinator.com | 7 Nov 2023
  • MauiKit 3.0 released

    3 projects | /r/programming | 25 May 2023
  • Why is Java's syntax hated so much?

    2 projects | /r/java | 19 Mar 2023
  • It took evolution a billion years to accomplish what software developers were able to accomplish in decades. That’s a 10000000x difference in development velocity.

    2 projects | /r/programmingcirclejerk | 8 Mar 2023
  • Why is Go's Garbage Collection so criticized?

    4 projects | /r/golang | 22 Nov 2022