Spidergram is a collection of tools my company Autogram has built or enabled over the past several years to support our work to automate content inventories for large websites: it's part web crawler, part domain model, and part mad science. We released the first public beta today.

This page summarizes the projects mentioned and recommended in the original post on /r/webscraping

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • spidergram

    Structural analysis tools for complex web sites

  • SheetJS js-xlsx

    📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

  • SheetJS for quickly generating complex reports in the familiar "workbook full of spreadsheets" style.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • Apify's Crawlee project, with a specific focus on Playwright. We decided to focus on it for now because the majority of our projects involve some kind of cross-browser evaluation for clients, and Playwright's ability to swap in Safari and Firefox rendering engines was a huge help.

  • oclif

    CLI for generating, building, and releasing oclif CLIs. Built by Salesforce.

  • Oclif to quickly click together CLI tools for kicking off and monitoring crawls, generating reports, etc.

  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Apify's Crawlee project, with a specific focus on Playwright. We decided to focus on it for now because the majority of our projects involve some kind of cross-browser evaluation for clients, and Playwright's ability to swap in Safari and Firefox rendering engines was a huge help.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts