What's new in the Webscraping Ecosystem ? from OxyCon 2022

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • estela

    estela, an elastic web scraping cluster 🕸

  • Estela: A webscraping framework on to of Kubernetes, which manage scaling (by Breno Colom)

  • structlog

    Simple, powerful, and fast logging for Python.

  • Structlog: A python library to structure your log entries

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Protobuf

    Protocol Buffers - Google's data interchange format

  • Accelerate flows with structured protocols like Protobuf, instead of JSON/CSV

  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Crawlee: A new webscraping framework by Apify

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts