Go Scraper

Open-source Go projects categorized as Scraper

Top 23 Go Scraper Projects

  • lux

    👾 Fast and simple video download library and CLI tool written in Go

    Project mention: Bilibili download stalls at around 30-60% | /r/youtubedl | 2023-05-18

    Not a fix, but I tend to use lux when downloading from bilibili. It is faster too.

  • colly

    Elegant Scraper and Crawler Framework for Golang

    Project mention: Scraping the full snippet from Google search result | dev.to | 2024-01-01

    SerpApi focuses on scraping search results. That's why we need extra help to scrape individual sites. We'll use GoColly package.

  • WorkOS

    The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • Ferret

    Declarative web scraping

  • rod

    A Devtools driver for web automation and scraping

    Project mention: Need help authenticating to Okta programatically. | /r/okta | 2023-07-03

    I have tried the following. 1. Login to Okta via browser programatically using go-rod. Which I managed to do so successfully, but I'm failing to load up Slack as it's stuck in the browser loader screen for Slack. 2. I tried to authenticate via Okta RESTful API. So far, I have managed to authenticate using {{domain}}/api/v1/authn, and then subsequently using MFA via the verify endpoint {{domain}}/api/v1/authn/factors/{{factorID}}/verify which returns me a sessionToken. From here, I can successfully create a sessionCookie which have proven quite useless to me. Perhaps I am doing it wrongly.

  • Geziyor

    Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

    Project mention: Show HN: I scraped 25M Shopify products to build a search engine | news.ycombinator.com | 2023-12-13

    As someone who has scraped millions of items myself, I had success using Geziyor (https://github.com/geziyor/geziyor) built in Go. Shopify sites are especially easy to scrape because they tend to share the same product data formatting and don't hide it behind JS rendering.

  • cariddi

    Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

    Project mention: cariddi v1.3.1 is out🥳 | /r/opensource | 2023-03-24

    cariddi is an open source (https://github.com/edoardottt/cariddi) web security tool. It takes as input a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more.

  • mangal

    📖 The most advanced (yet simple) cli manga downloader in the entire universe! Lua scrapers, export formats, anilist integration, fancy TUI and more!

    Project mention: What application handles manga downloads? | /r/selfhosted | 2023-05-19
  • LearnThisRepo.com

    Learn 300+ open source libraries for free using AI. LearnThisRepo lets you learn 300+ open source repos including Postgres, Langchain, VS Code, and more by chatting with them using AI!

  • till

    DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

  • finance-go

    :bar_chart: Financial markets data library implemented in go.

    Project mention: finance-go: NEW Data - star count:602.0 | /r/algoprojects | 2023-05-13
  • Dataflow kit

    Extract structured data from web sites. Web sites scraping.

  • ant

    A web crawler for Go (by yields)

  • GMDB

    GMDB is the ultra-simple, cross-platform Movie Library with Features (Search, Take Note, Watch Later, Like, Import, Learn, Instantly Torrent Magnet Watch)

  • dorkscout

    DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

    Project mention: Automatizovani Google Dorking | /r/programiranje | 2023-04-14
  • demeter

    Demeter is a tool for scraping the calibre web ui

  • meteor

    Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog. (by raystack)

  • spidy

    Domain names collector - Crawl websites and collect domain names along with their availability status. (by twiny)

  • JsonGenius

    Get structured JSON data from any page.

    Project mention: Show HN: SingleAPI – Convert the Internet into your own API | news.ycombinator.com | 2023-10-17

    isn’t this just using jsongenius[1]

    [1] https://github.com/semanser/JsonGenius

  • scraply

    Scraply a simple dom scraper to fetch information from any html based website

  • fitter

    New way for collect information from the API's/Websites (by PxyUp)

    Project mention: Show HN: Fitter – configurable open-source scraper | news.ycombinator.com | 2024-01-14
  • rrip

    Bulk image downloader for reddit.

    Project mention: rrip v0.5 - Go template filters / formatting, GNU style long options | /r/DataHoarder | 2023-08-03
  • reX

    Reverse Engineered Twitter's API (by zimovane)

    Project mention: GitHub - Amovane/reX: Reverse Engineered Twitter's API: Since twitter dev removed the API for accessing user followers and following, developers have found it difficult to obtain this data. Here, I'm sharing my reverse engineering solution | /r/bag_o_news | 2023-09-18
  • grab

    Configurable Scraper & Downloader, Powered by RegExp and Go (by everdrone)

  • ultimate-guitar-scraper

    A simple scraper for Ultimate-Guitar.com's mobile API, written in Go. (by Pilfer)

    Project mention: Freetar – an alternative front end for ultimate-guitar.com | news.ycombinator.com | 2023-11-29

    I added a PR[0] to a CLI project doing exactly this for all your saved tabs. Could probably be extended for all tabs on the site as well.

    [0]: https://github.com/Pilfer/ultimate-guitar-scraper/pull/2

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-14.

Go Scraper related posts

Index

What are some of the best open-source Scraper projects in Go? This list will help you:

Project Stars
1 lux 23,229
2 colly 21,724
3 Ferret 5,582
4 rod 4,579
5 Geziyor 2,443
6 cariddi 1,254
7 mangal 1,072
8 till 803
9 finance-go 668
10 Dataflow kit 636
11 ant 276
12 GMDB 235
13 dorkscout 216
14 demeter 172
15 meteor 163
16 spidy 136
17 JsonGenius 126
18 scraply 125
19 fitter 95
20 rrip 68
21 reX 64
22 grab 63
23 ultimate-guitar-scraper 58
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com