Got-scraping Alternatives
Similar projects and alternatives to got-scraping
-
google-search-results-php
Google Search Results PHP API via Serp Api
-
header-generator
NodeJs package for generating browser-like headers.
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
-
-
-
parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
got-scraping reviews and mentions
-
How to Crawl the Web with Scrapy
While I agree that Scrapy is a great tool for beginner tutorials and easy entry into scraping, it's becoming difficult to use it in real world scenarios because almost all the large players now employ some anti-bot or anti-scraping protection.
A great example above all is Cloudflare. You simply can't convince Cloudflare you're a human with Scrapy alone. Scrapy has only experimental support of HTTP2 and does not support proxies over HTTP2 (https://github.com/scrapy/scrapy/issues/5213). Yet, all browsers use HTTP2 now, which means all normal users use HTTP2... You get the point.
What we use now is Got Scraping (https://github.com/apify/got-scraping). It's a special purpose extension of Got (HTTP client with 18 mil weekly downloads) that masks its HTTP communication as if it was coming from a real browser. Of course, this will not get you as far as Puppeteer or Playwright (headless browsers), but it improved our scraping tremendously. If you need a full crawling library, see the Apify SDK (https://sdk.apify.com) which uses Got Scraping under the hood.
- Show HN: Web scraping focused HTTP client for Node.js
Stats
The primary programming language of got-scraping is TypeScript.