cheerio vs Nokogiri

cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML. (by cheeriojs)

Source Code

cheerio.js.org

Suggest alternative

Edit details

Nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. (by sparklemotion)

Parsers HTML/XML Parsing Ruby Nokogiri XML Sax ruby-gem Xslt xerces Libxml2 libxslt

Source Code

nokogiri.org

Suggest alternative

Edit details

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

surveyjs.io

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

cheerio		Nokogiri
	Project
50	Mentions	20
27,780	Stars	6,105
0.5%	Growth	0.2%
9.7	Activity	9.4
8 days ago	Latest Commit	8 days ago
TypeScript	Language	C
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

cheerio

Posts with mentions or reviews of cheerio. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-02.

8 NPM Packages for JavaScript Beginners [2024][+tutorials]
6 projects | dev.to | 2 Apr 2024

Cheerio is your ticket to the world of server-side magic, allowing you to manipulate HTML and XML documents with jQuery-like syntax. It’s perfect for web scraping, data extraction, or just making sense of the mess that is web content. With Cheerio, you get to play around with the DOM, use CSS selectors, and basically do all the cool things you'd do in the browser, but server-side.
How to scrape Amazon products
4 projects | dev.to | 1 Apr 2024

In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.
Creating and deploying web scraper using Apify
1 project | dev.to | 27 Mar 2024

Used libraries Axios - it is a promise HTTP clients to make requests to the specified URL. Cheerio- it is a library for parsing and manipulating HTML that is commonly used here for extracting data from downloaded HTML content. Apify SDK- it is for building Apify Actors, that is utilized for initializing actor environments, getting input data, and pushing extracted data to the dataset.
Htmlq: Like Jq, but for HTML
2 projects | news.ycombinator.com | 19 Mar 2024

Nice. I've used Cheerio for this in the past: https://github.com/cheeriojs/cheerio?tab=readme-ov-file#sele...
Automating Data Collection with Apify: From Script to Deployment
4 projects | dev.to | 17 Mar 2024

For this article, I will be using the TypeScript Starter template as shown in the screenshot above. This comes with Nodejs, Cheerio, Axios
Web Scraping in Python – The Complete Guide
11 projects | news.ycombinator.com | 20 Feb 2024

> I'm not sure why Python web scraping is so popular compared to Node.js web scraping
Take this with a grain of salt, since I am fully cognizant that I'm the outlier in most of these conversations, but Scrapy is A++ the no-kidding best framework for this activity that has been created thus far. So, if there was scrapyjs maybe I'd look into it, but there's not (that I'm aware of) so here we are. This conversation often comes up in any such "well, I just use requests & ..." conversation and if one is happy with main.py and a bunch of requests invocations, I'm glad for you, but I don't want to try and cobble together all the side-band stuff that Scrapy and its ecosystem provide for me in a reusable and predictable way
Also, often those conversations conflate the server side language with the "scrape using headed browser" language which happens to be the same one. So, if one is using cheerio <https://github.com/cheeriojs/cheerio> then sure node can be a fine thing - if the blog post is all "fire up puppeteer, what can go wrong?!" then there is the road to ruin of doing battle with all kinds of detection problems since it's kind of a browser but kind of not
I, under no circumstances, want the target site running their JS during my crawl runs. I fully accept responsibility for reproducing any XHR or auth or whatever to find the 3 URLs that I care about, without downloading every thumbnail and marketing JS and beacon and and and. I'm also cognizant that my traffic will thus stand out since it uniquely does not make the beacon and marketing calls, but my experience has been that I get the ban hammer less often with my target fetches than trying to pretend to be a browser with a human on the keyboard/mouse but is not
Web Scraping in Node.js Using Axios,Cheerio and Json2csv
3 projects | dev.to | 20 Nov 2023

Web scraping is a powerful technique used to extract data from websites. In this tutorial, we'll explore how to perform web scraping using Node.js, Axios for making HTTP requests,Cheerio for parsing HTML content and also json2csv for converting json data to csv. We'll scrape product data from a sample website, "https://scrapeme.live/shop/".
Portadom: A Unified Interface for DOM Manipulation
4 projects | dev.to | 30 Aug 2023

Web scraping, while immensely useful, often requires developers to navigate a sea of tools and libraries, each with its own quirks and intricacies. Whether it's JSDOM, Cheerio, Playwright, or even just plain old vanilla JS in the DevTools console, moving between these platforms can be a challenge.
Querying parsed HTML in BigQuery
4 projects | dev.to | 26 May 2023

While looking for a way to implement capo.js in BigQuery to understand how pages in HTTP Archive are ordered, I came across the Cheerio library, which is a jQuery-like interface over an HTML parser.
JavaScript Web Crawler with Node.js: A Step-By-Step Tutorial
3 projects | dev.to | 17 Apr 2023

Cheerio is a JavaScript tool for parsing HTML and XML in Node.js. It provides APIs for traversing and manipulating the DOM of a webpage.

Nokogiri

Posts with mentions or reviews of Nokogiri. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-20.

Web Scraping in Python – The Complete Guide
11 projects | news.ycombinator.com | 20 Feb 2024
Did you know Nokogiri now has opt-in HTML5 parsing?
7 projects | /r/ruby | 5 Jun 2023

release planning: v1.16.0 · Issue #2897 · sparklemotion/nokogiri
As a Go developer, I’m surprised Crystal isn’t more popular
8 projects | /r/crystal_programming | 26 May 2023

What's holding me back from going all in with Crystal is I have a lot of pre-existing Ruby code, and porting Ruby code to Crystal can be tricky. For example, Crystal lacks an Enumerator class (aka generators) due to captured block semantics. I also wish the shards ecosystem was a little more mature; for example there's multiple HTML parsing libraries, but none have all of the features that Ruby's Nokogiri has. For new greenfield backend projects, I would totally use Crystal.
Two months into learning Ruby, it is the most beautiful language I ever learned
5 projects | /r/ruby | 25 Feb 2023

Welcome! Ruby isn't exactly "dying", but the hype/popularity is definitely fading. This is primarily because Ruby is no longer "new", most of Ruby's popularity came from Rails, and now Rails is no longer the "new hotness". However, Ruby still has lots of awesome features and lots of awesome other libraries and frameworks, such as the new fancy irb gem that uses reline, nokogiri, chunky_png, the async gems, Dragon Ruby, SciRuby, Ronin, and the new Hanami web framework.
What should I be learning?
3 projects | /r/ruby | 25 Oct 2022
Comparable maintained Kimurai alternative?
1 project | /r/ruby | 4 Jun 2022
In "Your Name" (2016), Mitsuha and Tesshi are seen turning a tree into their makeshift café, which is why one of the trees in the town is later missing
1 project | /r/MovieDetails | 20 May 2022

great for hacking at xml
Ditch Your Version Manager
18 projects | news.ycombinator.com | 19 Sep 2021

Mike has worked hard over the years to have Nokogiri come with its dependencies. It does come with libxml and all that is required.
From https://nokogiri.org
> These dependencies are met by default by Nokogiri's packaged versions of the libxml2 and libxslt source code, but a configuration option --use-system-libraries is provided to allow specification of alternative library locations.
Some authors work hard to have their tools do the right thing and consistently.
Web scraping with rails
7 projects | /r/rails | 16 Sep 2021

If the page is rendered as html you can use Nokogiri. It has great support and is pretty easy to get started with too.
Nokogiri 1.12 supports HTML5 parsing (after assimilating Nokogumbo)
1 project | /r/ruby | 4 Aug 2021

And even now, pulling in a Java-based HTML5 parser is still probably easier than re-implementing in FFI, which is why I created https://github.com/sparklemotion/nokogiri/issues/2227 and would love to have this the conversation there if possible.

What are some alternatives?

When comparing cheerio and Nokogiri you can also consider the following projects:

jsdom - A JavaScript implementation of various web standards, for use with Node.js

Oga - Oga is an XML/HTML parser written in Ruby.

puppeteer - Node.js API for Chrome

Ox - Ruby Optimized XML Parser

Electron - :electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS

HTML::Pipeline - HTML processing filters and utilities

Prettyprint Object - Function to pretty-print an object with an ability to annotate every value.

Oj - Optimized JSON

Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

ROXML - ROXML is a module for binding Ruby classes to XML. It supports custom mapping and bidirectional marshalling between Ruby and XML using annotation-style class methods, via Nokogiri or LibXML.

webworker-threads - Lightweight Web Worker API implementation with native threads

HappyMapper - Object to XML mapping library, using Nokogiri (Fork from John Nunemaker's Happymapper)

cheerio vs jsdom Nokogiri vs Oga cheerio vs puppeteer Nokogiri vs Ox cheerio vs Electron Nokogiri vs HTML::Pipeline cheerio vs Prettyprint Object Nokogiri vs Oj cheerio vs Playwright Nokogiri vs ROXML cheerio vs webworker-threads Nokogiri vs HappyMapper

Compare cheerio vs Nokogiri and see what are their differences.

cheerio

Nokogiri

cheerio

Nokogiri

What are some alternatives?