Cached Chrome Top Million Websites

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ClickHouse

208 34,153 10.0 C++

ClickHouse® is a free analytics DBMS for big data

If you are interested in the research on technologies used on the Internet, I recommend playing with the "Minicrawl" dataset.
It contains data about ~7 million top websites, and for every website, it also contains: - the full content of the main page; - the verbose output of curl, containing various timing info; the HTTP headers, protocol info...
Using this dataset, you can build a service similar to https://builtwith.com/ for your research.
Data: https://clickhouse-public-datasets.s3.amazonaws.com/minicraw... (129 GB compressed, ~1 TB uncompressed).
Description: https://github.com/ClickHouse/ClickHouse/issues/18842
You can easily try it with clickhouse-local without downloading:
  $ curl https://clickhouse.com/ | sh

crux-top-lists

6 710 6.2 Python

Downloadable snapshots of the Chrome Top Million Websites pulled from public CrUX data in Google BigQuery.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
hn-search

1,618 524 2.9 TypeScript

Hacker News Search

It's a tough thing to balance, but generally, bringing in someone's personal details as ammunition in an internet argument is not ok on HN (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...). I'm not saying those are never relevant, but
[editing...]

github-explorer

13 129 4.3 HTML

Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask)

Yes, it's continuously updated.
The source code is here: https://github.com/ClickHouse/github-explorer
This shell scripts updates it: https://github.com/ClickHouse/github-explorer/blob/main/upda...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project