Web Scraping with Javascript and Node.js

This page summarizes the projects mentioned and recommended in the original post on dev.to

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • httpbin

    HTTP Request & Response Service, written in Python + Flask.

  • We will use httpbin for testing. It offers several endpoints that will respond with headers, IP addresses, and many more.

  • nvm

    Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions

  • For the code to work, you will need Node (or nvm) and npm installed. Some systems have it pre-installed. After that, install all the necessary libraries by running npm install.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • cheerio

    The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

  • Cheerio is a "fast, flexible & lean implementation of core jQuery." It lets us find nodes with selectors, get text or attributes, and many other things. We will pass the HTML to cheerio and then query it as we would in a browser environment.

  • puppeteer

    Node.js API for Chrome

  • Until now, every page visited was done using axios.get, which can be inadequate in some cases. Say we need Javascript to load and execute or interact in any way with the browser (via mouse or keyboard). While avoiding them would be preferable - for performance reasons -, sometimes there is no other choice. Selenium, Puppeteer, and Playwright are the most used and known libraries. The snippet below shows only the User-Agent, but since it is a real browser, the headers will include the entire set (Accept, Accept-Encoding, etcetera).

  • axios

    Promise based HTTP client for the browser and node.js

  • Axios is a "promise based HTTP client" that we will use to get the HTML from a URL. It allows several options such as headers and proxies, which we will cover later. If you use TypeScript, they "include TypeScript definitions and a type guard for Axios errors."

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • My First Server and REST API: Essentials for Frontenders

    2 projects | dev.to | 12 Jan 2023
  • [AskJS] I want to use Javascript to get a specific text from a website. .getElementbyId

    3 projects | /r/learnjavascript | 26 Aug 2022
  • How to Add CRM to Your QR Code Application

    3 projects | dev.to | 20 Apr 2022
  • Simple Cookies with Node.js and any frontend JavaScript framework

    2 projects | dev.to | 26 Mar 2022
  • The Fetch API is finally coming to Node.js

    2 projects | dev.to | 10 Mar 2022