Our great sponsors
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Do you remember the Firefox Reader View? It's a feature that removes all unnecessary components like buttons, menus, images, and so on, from a website, focusing on the readable content of the page. The library powering this feature is called Readability.js, which is open source.
We parse the url query string parameter from the API Gateway event and validate it to be a real URL. Then, we use Puppeteer to launch a new browser instance and open a new page. This new page is closed at the end of the function while the browser instance stays open until the Lambda is terminated.
The most crucial question was, of course, how to run Chrome on Lambda. Fortunately, much of the groundwork for running Chrome on Lambda had been laid by others. I used the @sparticuz/chromium package to run Chromium in headless mode. However, Chromium is a rather big dependency, so to speed up deployments, I created a Lambda Layer.
Readability.js requires a DOM object to parse the readable content from a website. That's why we create a DOM object with JSDOM and provide the HTML from the page and its current URL. By the way, the browser may have had to follow HTTP redirects, so the current URL doesn't necessarily have to be the one we provided initially. The parse function of the library returns the following result:
View on GitHub