Running web scraping service securely

This page summarizes the projects mentioned and recommended in the original post on dev.to

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • When it comes to web page resource scraping, Secutils.dev relies on a separate component - secutils-dev/secutils-web-scraper. I've built it on top of Playwright since I need to handle both resources that are statically defined in the HTML and those that are loaded dynamically. Leveraging Playwright, backed by a real browser, instead of parsing the static HTML opens up a ton of opportunities to turn a simple web resource scraper into a much more intelligent tool capable of handling all sorts of use cases: recording and replaying HARs, imitating user activity, and more.

  • secutils-web-scraper

    The web scrapper component of Secutils.dev

  • When it comes to web page resource scraping, Secutils.dev relies on a separate component - secutils-dev/secutils-web-scraper. I've built it on top of Playwright since I need to handle both resources that are statically defined in the HTML and those that are loaded dynamically. Leveraging Playwright, backed by a real browser, instead of parsing the static HTML opens up a ton of opportunities to turn a simple web resource scraper into a much more intelligent tool capable of handling all sorts of use cases: recording and replaying HARs, imitating user activity, and more.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • secutils

    Secutils.dev is an open-source, versatile, yet simple security toolbox for engineers and researchers (by secutils-dev)

  • In my previous post, I shared the update regarding the upcoming "Q3 2023 - Jul-Sep" milestone. While I briefly covered how I implemented the notifications subsystem in Secutils.dev, there are a few other important changes I've been working on for this milestone. One of these changes is related to the fact that I’m preparing to allow Secutils.dev users to inject custom JavaScript scripts into the web pages they track resources for (yay 🎉). As a result, I've spent some time hardening the Web Scraper environment's security and wanted to share what you should keep in mind if you’re building a service that needs to scrape arbitrary web pages.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • How to track anything on the internet or use Playwright for fun and profit

    5 projects | dev.to | 16 Jan 2024
  • Explore web applications through their content security policy (CSP)

    1 project | dev.to | 28 Nov 2023
  • Q4 2023 iteration: tracking arbitrary web content, user-specific webhook subdomains, inherited CSP, and more

    1 project | dev.to | 31 Oct 2023
  • Announcing 1.0.0-alpha.3 release: more powerful resource tracking, notifications and content sharing

    2 projects | dev.to | 24 Oct 2023
  • Building a scheduler for a Rust application

    1 project | dev.to | 26 Sep 2023