-
Playwright
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
secutils
Secutils.dev is an open-source, versatile, yet simple security toolbox for engineers and researchers (by secutils-dev)
When it comes to web page resource scraping, Secutils.dev relies on a separate component - secutils-dev/secutils-web-scraper. I've built it on top of Playwright since I need to handle both resources that are statically defined in the HTML and those that are loaded dynamically. Leveraging Playwright, backed by a real browser, instead of parsing the static HTML opens up a ton of opportunities to turn a simple web resource scraper into a much more intelligent tool capable of handling all sorts of use cases: recording and replaying HARs, imitating user activity, and more.
When it comes to web page resource scraping, Secutils.dev relies on a separate component - secutils-dev/secutils-web-scraper. I've built it on top of Playwright since I need to handle both resources that are statically defined in the HTML and those that are loaded dynamically. Leveraging Playwright, backed by a real browser, instead of parsing the static HTML opens up a ton of opportunities to turn a simple web resource scraper into a much more intelligent tool capable of handling all sorts of use cases: recording and replaying HARs, imitating user activity, and more.
In my previous post, I shared the update regarding the upcoming "Q3 2023 - Jul-Sep" milestone. While I briefly covered how I implemented the notifications subsystem in Secutils.dev, there are a few other important changes I've been working on for this milestone. One of these changes is related to the fact that I’m preparing to allow Secutils.dev users to inject custom JavaScript scripts into the web pages they track resources for (yay 🎉). As a result, I've spent some time hardening the Web Scraper environment's security and wanted to share what you should keep in mind if you’re building a service that needs to scrape arbitrary web pages.
Related posts
-
How to track anything on the internet or use Playwright for fun and profit
-
Explore web applications through their content security policy (CSP)
-
Q4 2023 iteration: tracking arbitrary web content, user-specific webhook subdomains, inherited CSP, and more
-
Announcing 1.0.0-alpha.3 release: more powerful resource tracking, notifications and content sharing
-
Building a scheduler for a Rust application