Go Scraper

Open-source Go projects categorized as Scraper

Top 23 Go Scraper Projects

  • lux

    πŸ‘Ύ Fast and simple video download library and CLI tool written in Go

  • Project mention: Bilibili download stalls at around 30-60% | /r/youtubedl | 2023-05-18

    Not a fix, but I tend to use lux when downloading from bilibili. It is faster too.

  • colly

    Elegant Scraper and Crawler Framework for Golang

  • Project mention: Scraping the full snippet from Google search result | dev.to | 2024-01-01

    SerpApi focuses on scraping search results. That's why we need extra help to scrape individual sites. We'll use GoColly package.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Ferret

    Declarative web scraping

  • rod

    A Devtools driver for web automation and scraping

  • Project mention: Need help authenticating to Okta programatically. | /r/okta | 2023-07-03

    I have tried the following. 1. Login to Okta via browser programatically using go-rod. Which I managed to do so successfully, but I'm failing to load up Slack as it's stuck in the browser loader screen for Slack. 2. I tried to authenticate via Okta RESTful API. So far, I have managed to authenticate using {{domain}}/api/v1/authn, and then subsequently using MFA via the verify endpoint {{domain}}/api/v1/authn/factors/{{factorID}}/verify which returns me a sessionToken. From here, I can successfully create a sessionCookie which have proven quite useless to me. Perhaps I am doing it wrongly.

  • Geziyor

    Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

  • Project mention: Show HN: I scraped 25M Shopify products to build a search engine | news.ycombinator.com | 2023-12-13

    As someone who has scraped millions of items myself, I had success using Geziyor (https://github.com/geziyor/geziyor) built in Go. Shopify sites are especially easy to scrape because they tend to share the same product data formatting and don't hide it behind JS rendering.

  • cariddi

    Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

  • mangal

    πŸ“– The most advanced (yet simple) cli manga downloader in the entire universe! Lua scrapers, export formats, anilist integration, fancy TUI and more!

  • Project mention: What application handles manga downloads? | /r/selfhosted | 2023-05-19
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • till

    DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

  • finance-go

    :bar_chart: Financial markets data library implemented in go.

  • Project mention: finance-go: NEW Data - star count:602.0 | /r/algoprojects | 2023-05-13
  • Dataflow kit

    Extract structured data from web sites. Web sites scraping.

  • ant

    A web crawler for Go (by yields)

  • GMDB

    GMDB is the ultra-simple, cross-platform Movie Library with Features (Search, Take Note, Watch Later, Like, Import, Learn, Instantly Torrent Magnet Watch)

  • dorkscout

    DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

  • demeter

    Demeter is a tool for scraping the calibre web ui

  • meteor

    Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog. (by raystack)

  • JsonGenius

    Get structured JSON data from any page.

  • Project mention: Show HN: SingleAPI – Convert the Internet into your own API | news.ycombinator.com | 2023-10-17

    isn’t this just using jsongenius[1]

    [1] https://github.com/semanser/JsonGenius

  • spidy

    Domain names collector - Crawl websites and collect domain names along with their availability status. (by twiny)

  • scraply

    Scraply a simple dom scraper to fetch information from any html based website

  • fitter

    New way for collect information from the API's/Websites (by PxyUp)

  • Project mention: Show HN: Fitter – configurable open-source scraper | news.ycombinator.com | 2024-01-14
  • rrip

    Bulk image downloader for reddit.

  • Project mention: rrip v0.5 - Go template filters / formatting, GNU style long options | /r/DataHoarder | 2023-08-03
  • ultimate-guitar-scraper

    A simple scraper for Ultimate-Guitar.com's mobile API, written in Go. (by Pilfer)

  • Project mention: Freetar – an alternative front end for ultimate-guitar.com | news.ycombinator.com | 2023-11-29

    I added a PR[0] to a CLI project doing exactly this for all your saved tabs. Could probably be extended for all tabs on the site as well.

    [0]: https://github.com/Pilfer/ultimate-guitar-scraper/pull/2

  • reX

    Reverse Engineered Twitter's API (by zmovane)

  • Project mention: GitHub - Amovane/reX: Reverse Engineered Twitter's API: Since twitter dev removed the API for accessing user followers and following, developers have found it difficult to obtain this data. Here, I'm sharing my reverse engineering solution | /r/bag_o_news | 2023-09-18
  • grab

    Configurable Scraper & Downloader, Powered by RegExp and Go (by everdrone)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Go Scraper related posts

  • Show HN: Fitter – configurable open-source scraper

    1 project | news.ycombinator.com | 14 Jan 2024
  • Scraping the full snippet from Google search result

    3 projects | dev.to | 1 Jan 2024
  • Show HN: Flyscrape – A standalone and scriptable web scraper in Go

    6 projects | news.ycombinator.com | 11 Nov 2023
  • Colly: Elegant Scraper and Crawler Framework for Golang

    1 project | news.ycombinator.com | 23 Aug 2023
  • rrip v0.5 - Go template filters / formatting, GNU style long options

    1 project | /r/DataHoarder | 3 Aug 2023
  • PxyUp/fitter: New way for collect information from the API's/Websites

    1 project | /r/golang | 3 Jul 2023
  • Show HN: Fitter – next generation web-scraper

    1 project | news.ycombinator.com | 28 Jun 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 7 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source Scraper projects in Go? This list will help you:

Project Stars
1 lux 25,312
2 colly 22,205
3 Ferret 5,620
4 rod 4,808
5 Geziyor 2,480
6 cariddi 1,360
7 mangal 1,176
8 till 807
9 finance-go 686
10 Dataflow kit 636
11 ant 276
12 GMDB 234
13 dorkscout 216
14 demeter 174
15 meteor 171
16 JsonGenius 156
17 spidy 142
18 scraply 126
19 fitter 98
20 rrip 70
21 ultimate-guitar-scraper 69
22 reX 63
23 grab 63

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com