Show HN: Rust nom parsing Starcraft2 Replays into Arrow for Polars data analysis

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • s2protocol-rs

    Starcraft 2 Protocol Replay Reader

  • I mean for example https://github.com/sebosp/s2protocol-rs/blob/755098fb86ab6b1... I hacked my way around the json protocol specification, an enum has n types, using log I can find the number of bits I need to read to uniquely identify each variant, that kind of serialization I wonder the name of.

  • nom

    Rust parser combinator framework

  • I may be the only one not familiar, but nom refers to https://github.com/rust-bakery/nom which looks like a pretty handy way to parse binary data in Rust.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Tims-PackageServer

    Lightweight Package Server for WoltLab Community Framework

  • I'm using nom to parse a simple access control ACL in https://github.com/wbbaddons/Tims-PackageServer/blob/master/.... You can see how it looks in the tests of the linked file.

    The DSL supports comparison operators, '&&', '||' and nested expressions, while preventing mixing of '&&' and '||' without making precedence clear using parentheses. This DSL should fit 'non-trivial', but it still should be simple enough to easily understand it.

  • mpq

    Decoder/parser of Blizzard's MPQ archive file format

  • SC2 replays are MPQ files, which is a proprietary format created and used by Blizzard. It's an archive that may contain multiple files stored with different compression and optionally encrypted. I wrote a lib to parse MPQ files that embodies SC2Replays: https://github.com/icza/mpq. I also wrote an SC2 replay parser that is more or less a port of the official s2protocol: https://github.com/icza/s2prot

  • s2prot

    Decoder/parser of Blizzard's StarCraft II replay file format (*.SC2Replay)

  • SC2 replays are MPQ files, which is a proprietary format created and used by Blizzard. It's an archive that may contain multiple files stored with different compression and optionally encrypted. I wrote a lib to parse MPQ files that embodies SC2Replays: https://github.com/icza/mpq. I also wrote an SC2 replay parser that is more or less a port of the official s2protocol: https://github.com/icza/s2prot

  • pdx-tools

    View maps, graphs, and tables of your save and compete in a casual, evergreen leaderboard of EU4 achievement speed runs. Upload and share your save with the world.

  • Thanks for sharing, very inspiring. I love Rust for parsing video game replays / save files. I've authored a Rocket League replay parser (boxcars) and an entire suite of web visualizations (via Webassembly) for EU4 called pdx.tools https://pdx.tools

    It's not easy to work with proprietary formats, but they've both become pretty popular, so I would 100% recommend sinking more time into this project as long as it scratches your itch. Gamers are always looking for more stats and deeper insights

  • parse-rosetta-rs

    Comparing parser APIs

  • For a very rough comparison of parsers, see https://github.com/rosetta-rs/parse-rosetta-rs

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rust-parser

  • This is insanely cool! Very impressed you managed to implement a full parser in Rust.

    I implemented a basic one in Rust a while back: https://github.com/ZephyrBlu/rust-parser

    And a full one in Python with a few bells and whistles ages ago: https://github.com/ZephyrBlu/zephyrus-sc2-parser

    Don't maintain either of them though :(, and the Rust one is super rough.

    SC2 is a very interesting area for data analysis, but at the same time I found it very challenging. There is so much nuance and inconsistency across games it can be really hard to do accurately do things like categorize builds or measure build timings.

    The area I ended up focusing on was builds, and I feel like I did some interesting stuff there: https://sc2.gg/reports/top-openings-2022/.

    I found personal statistics less interesting than aggregate statistics. Even pro games are very volatile, ladder games even more so. Extremely hard to get reliable signal out of them if you're trying to track things across games. Even simple things like Collection Rate are poor indicators without significant categorization work (Matchup, build, opponent build, etc).

  • zephyrus-sc2-parser

    A parser for .SC2Replay files

  • This is insanely cool! Very impressed you managed to implement a full parser in Rust.

    I implemented a basic one in Rust a while back: https://github.com/ZephyrBlu/rust-parser

    And a full one in Python with a few bells and whistles ages ago: https://github.com/ZephyrBlu/zephyrus-sc2-parser

    Don't maintain either of them though :(, and the Rust one is super rough.

    SC2 is a very interesting area for data analysis, but at the same time I found it very challenging. There is so much nuance and inconsistency across games it can be really hard to do accurately do things like categorize builds or measure build timings.

    The area I ended up focusing on was builds, and I feel like I did some interesting stuff there: https://sc2.gg/reports/top-openings-2022/.

    I found personal statistics less interesting than aggregate statistics. Even pro games are very volatile, ladder games even more so. Extremely hard to get reliable signal out of them if you're trying to track things across games. Even simple things like Collection Rate are poor indicators without significant categorization work (Matchup, build, opponent build, etc).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts