Scraper

Open-source projects categorized as Scraper | Edit details

Top 23 Scraper Open-Source Projects

  • Huginn

    Create agents that monitor and act on your behalf. Your agents are standing by!

    Project mention: Alternative to ProcessMaker | reddit.com/r/selfhosted | 2022-05-10

    Huginn (last commit 2 month ago)

  • cheerio

    Fast, flexible, and lean implementation of core jQuery designed specifically for the server.

    Project mention: Show HN: A Full-Stack Web Framework Written in Go | news.ycombinator.com | 2022-05-13

    Sure, it's actually been a long journey since PHP.

    I switched from PHP to Node about 11 years ago. I was probably one of the first 100 Node.js developers. My biggest contribution there was creating https://github.com/cheeriojs/cheerio.

    Go spoils me because it "just works". The best way I can explain it is when I'm stuck on something in Go, it's almost always my bad, not something that Go is doing poorly. This is in contrast to the Node.js ecosystem where it feels like half of the problems I run into shouldn't be my responsibility.

    I relate a lot to this post by @kburke: https://kevin.burke.dev/kevin/one-year-of-node-js/

    My transition from Node.js to Go happened because I was working solo on https://standupjack.com/ and I kept running into issues with the message broker. It was using the most popular RabbitMQ library for Node.js, yet was still super flaky.

    About 6 years ago, I was chatting with TJ Holowaychuk and he kept singing Go's praises. One holiday I decided to rewrite the message broker in Go. I got it into production in 3 days and didn't have anymore problems with it. That set me on the Go journey :)

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • lux

    👾 Fast and simple video download library and CLI tool written in Go

    Project mention: lux VS youtube - a user suggested alternative | libhunt.com/r/lux | 2022-02-02
  • colly

    Elegant Scraper and Crawler Framework for Golang

    Project mention: First project - The API for engineering multiple choice questions | reddit.com/r/golang | 2022-05-08

    There is the scraper struct that spawns goroutine. Then, each goroutine scraps the web URL using go colly, and the structured MCQ questions are served from API endpoints. Users can choose between persistent Postgres model or in-memory data structure to store the questions and retrieve them from API. I have used dependency injection to inject the models via the interface to my application struct.

  • newspaper

    News, full-text, and article metadata extraction in Python 3. Advanced docs:

    Project mention: Website categorization - use cases, taxonomies, content extraction | dev.to | 2022-03-18

    There are also many ready made libraries available for content extraction written in python which is more commonly used in data science, e.g. goose3 (https://github.com/goose3/goose3) and newspaper (https://github.com/codelucas/newspaper).

  • instagram-scraper

    Scrapes an instagram user's photos and videos

    Project mention: How To download Saves from instagram? | reddit.com/r/DataHoarder | 2022-04-08

    There was a FOSS program that worked. The problem is, if they detect anything remotely like scraping, they'll ban your account. I don't know what a good "pause" amount of time to set in the program would be to get around their flags. I think this was the one I used before (and got my account banned.) https://github.com/arc298/instagram-scraper And then if you don't login, they make you wait a really long time between downloads and you can't access some information (like your saves.)

  • Ferret

    Declarative web scraping

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    Project mention: Scrapping - How to deal with page changes Ai | reddit.com/r/webscraping | 2022-03-25

    It depends on the website, but autoscraper was used to calculate similar nodes given the text to search. Not sure how it works now but it's open source.

  • node-ytdl-core

    YouTube video downloader in javascript.

    Project mention: “YouTube-dl” and “Pirate Bay” back on DDG | news.ycombinator.com | 2022-04-17
  • OnlyFans

    Scrape all the media from an OnlyFans account - Updated regularly

    Project mention: I can't get onlyfans-dl to work | reddit.com/r/Piracy | 2021-12-01

    I use this works well but yeah if you want me to scrape it I got you

  • browser-fingerprinting

    Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

    Project mention: Hey! Another FREE guide for you! | reddit.com/r/proxies | 2022-04-18

    So he here is a link to the original, true Github repo, so at least the author might get reimbursed via his affiliation links and get rightfully paid rather than just plagiarized by other proxy blogs - https://github.com/niespodd/browser-fingerprinting

  • Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

    Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

    Project mention: Anyone wanna share free company Udemy/Udacity accounts? | reddit.com/r/cscareerquestions | 2021-12-31

    I would check this out if you're looking to get ahold of Udemy courses that get discounted to free. https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

  • rod

    A Devtools driver for web automation and scraping

    Project mention: Rod - A Devtools driver for web automation and scraping | reddit.com/r/github_trends | 2022-04-29
  • instagram-scraper

    scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot (by realsirjoe)

    Project mention: Help in scraping Instagram | reddit.com/r/webscraping | 2021-10-09

    Yes: https://github.com/realsirjoe/instagram-scraper

  • Geziyor

    Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

  • google-play-scraper

    Node.js scraper to get data from Google Play

    Project mention: Where can I find more SaaS companies? | reddit.com/r/SaaS | 2022-02-10

    Hacker News is a good board to troll, they announce there a lot, also places like techcrunch and producthunt have saas companies featured. If you want a volume solution, https://github.com/facundoolano/google-play-scraper is good for scraping the play store, from which you can just search for what you want. I accumulated a database of 200K apps with that in a week.

  • JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

  • bulk-downloader-for-reddit

    Downloads and archives content from reddit

    Project mention: Any self hosted reddit archivers? | reddit.com/r/selfhosted | 2022-05-12

    no web-ui but saves data to json, xml or yaml: https://github.com/aliparlakci/bulk-downloader-for-reddit

  • Wombat

    Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

  • snscrape

    A social networking service scraper in Python

    Project mention: Is there a way to extract more than 3200 Tweets from a single user on Twitter? | reddit.com/r/DataHoarder | 2022-04-02

    https://github.com/JustAnotherArchivist/snscrape is what I use.

  • freeDictionaryAPI

    There was no free Dictionary API on the web when I wanted one for my friend, so I created one.

    Project mention: How to start a big JSON database? | reddit.com/r/learnprogramming | 2021-11-23

    I am making an app that is the word of the day but is in any* language. I found this AMAZING API that provides several+ different languages with definitions, origins, phonetics, sometimes audio the works.

  • node-website-scraper

    Download website to local directory (including all css, images, js, etc.)

    Project mention: Website Scraper | NodeJS | dev.to | 2021-06-16

    Visit the official Github repository for more information here.

  • cinemagoer

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

    Project mention: [OC]IMDB Top 30 movies: cast death rate | reddit.com/r/dataisbeautiful | 2022-01-17
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-05-13.

Scraper related posts

Index

What are some of the best open-source Scraper projects? This list will help you:

Project Stars
1 Huginn 35,587
2 cheerio 25,068
3 lux 17,883
4 colly 16,585
5 newspaper 11,851
6 instagram-scraper 6,196
7 Ferret 4,973
8 autoscraper 4,367
9 node-ytdl-core 3,210
10 OnlyFans 2,914
11 browser-fingerprinting 2,887
12 Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE 2,638
13 rod 2,394
14 instagram-scraper 2,353
15 Geziyor 1,704
16 google-play-scraper 1,696
17 JobFunnel 1,553
18 bulk-downloader-for-reddit 1,353
19 Wombat 1,261
20 snscrape 1,246
21 freeDictionaryAPI 1,154
22 node-website-scraper 1,121
23 cinemagoer 969
Find remote jobs at our new job board 99remotejobs.com. There are 8 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com