Texting Robots: Taming robots.txt with Rust and 34 million tests

This page summarizes the projects mentioned and recommended in the original post on /r/rust

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • texting_robots

    Texting Robots: A Rust native `robots.txt` parser with thorough unit testing

  • I hope the Texting Robots codebase is already in a reasonable starting place but as this is my first crate and I'd love feedback! The library's error handling are particularly a spot of uncertainty for me. For extensions I have a WASM + WASI proof of concept but am uncertain how to proceed re: getting it in a state other languages could use as a more minimal library. FFI might be the more certain road for now but WASM would take away the pain of compilation for the end user with only a minimal performance impact.

  • lol-html

    Low output latency streaming HTML parser/rewriter with CSS selector-based API

  • Thanks again and happy to answer any questions! My current unreleased Rust projects include a web crawler that uses Tokio + Tokio Console + Reqwest with this crate for robots.txt and a fast text extraction library using lol-html that I am planning to sprinkle with some minimal ML to get Readability.js style intelligent extraction (with training in Python). See Fathom for an example of the ML approach I'll likely take.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • PyO3

    Rust bindings for the Python interpreter

  • As the author I just wanted to say thanks to the Rust community. You've been incredibly welcoming as a community (even nudging along this work with upvotes) and Rust itself has been a joy to work with. Much of my life has had Python as my default fallback (first language and recently in machine learning) but Rust is starting to take over. I've always dropped to C / Cython in the past when I needed speed but that was rare. Rust started as PyO3 extensions to Python and has slowly consumed more and more of the code I write =]

  • readability

    A standalone version of the readability lib

  • Thanks again and happy to answer any questions! My current unreleased Rust projects include a web crawler that uses Tokio + Tokio Console + Reqwest with this crate for robots.txt and a fast text extraction library using lol-html that I am planning to sprinkle with some minimal ML to get Readability.js style intelligent extraction (with training in Python). See Fathom for an example of the ML approach I'll likely take.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts