Querying parsed HTML in BigQuery

Our great sponsors

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

Our great sponsors

custom-metrics

1 15 6.9 JavaScript

Custom metrics to use with WebPageTest agents

To avoid this headache in HTTP Archive analyses, we've resorted to custom metrics. These are executed on each page at runtime, and it's been really effective. It enables us to analyze both the fully rendered page as well as the static HTML. But one big limitation with custom metrics is that they only work at runtime. So if we want to change the code or analyze an older dataset, we're out of luck.

capo.js

5 871 7.3 JavaScript

Get your <head> in order

While looking for a way to implement capo.js in BigQuery to understand how pages in HTTP Archive are ordered, I came across the Cheerio library, which is a jQuery-like interface over an HTML parser.

SurveyJS

surveyjs.io sponsored

Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
httparchive.org

12 317 8.7 JavaScript

The HTTP Archive website hosted on App Engine

A longstanding problem in the HTTP Archive dataset has been extracting insights from blobs of HTML in BigQuery. For example, take the source code of example.com:

cheerio

50 27,780 9.7 TypeScript

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

While looking for a way to implement capo.js in BigQuery to understand how pages in HTTP Archive are ordered, I came across the Cheerio library, which is a jQuery-like interface over an HTML parser.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project