How SerpApi sped up data extraction from HTML from 3s to 800ms (or How to profile and optimize Ruby code and C extension)

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • flamescope

    FlameScope is a visualization tool for exploring different time ranges as Flame Graphs.

  • I searched over the web how to profile C extensions for Ruby and C code in general, and found out Brendan Gregg’s tutorial on Linux perf. That was my first usage of Linux perf profiler. I’ve also tried gperftools and pprof, because seen its usage. And flamescope, because it was made by Brendan Gregg. There are many similar tools and it was hard to figure out what to use during two weeks or so.

  • flamegraph

    Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3 (by flamegraph-rs)

  • flamescope shows the same as flamegraph. Both of these tools use the same tools to generate chart probably.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • perf_data_converter

    Tool to convert Linux perf files to the profile.proto format used by pprof

  • I’ve installed perf_data_converter to be able to use perf.data report with pprof.

  • nokogiri-rust

    Discontinued Ruby FFI wrapper around scraper crate to be used instead of Nokogiri. Status: proof of concept.

  • As of an experiment, I’ve made an FFI wrapper around the Rust scraper crate. at_css.text calls of proof of concept are 60 times faster than Nokogiri ones.

  • bcc

    BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more

  • Haven’t tried to use bcc for tracing and profiling.

  • lexbor

    Lexbor is development of an open source HTML Renderer library. https://lexbor.com

  • I’m glad to have the opportunity to contribute to an open-source project that is used by thousands of people. Hopefully, we will speed up Nokogiri (or XML parser it uses) to match the performance of html5ever or lexbor at some point in the future. 800 ms to extract data from HTML is still too much.

  • rbspy

    Sampling CPU profiler for Ruby

  • c function is not very helpful to find the performance problem, so we dug deeper.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Nokogiri

    Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.

  • It worked because CFLAGS are passed here and there in ext/nokogiri/extconf.rb.

  • linux

    Linux kernel source tree

  • Haven’t read the entire documentation about perf.

  • oga

  • Julien Khaleghy also tried Oga gem instead of Nokogiri. It was about six times faster than Nokogiri.

  • ruby-ll

  • But some tests were failing with LL::ParserError from ruby-ll that is used in Oga.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Ruby 3.3's YJIT: Faster While Using Less Memory

    5 projects | news.ycombinator.com | 18 Dec 2023
  • Need help making sense of these benchmark results

    1 project | /r/rust | 17 Oct 2023
  • Why is my code so slow ? advent of code 2022, day 16 (basic graph stuff)

    2 projects | /r/rust | 8 Jul 2023
  • why is my code so slow ? advent of code 2023, day 16 (basic graph stuff)

    2 projects | /r/rust | 8 Jul 2023
  • how expensive is an operation?

    5 projects | /r/rust | 3 Jul 2023