web-scraper

Open-source projects categorized as web-scraper

Top 23 web-scraper Open-Source Projects

  • awesome-crawler

    A collection of awesome web crawler,spider in different languages

    Project mention: More than 400 start.me OSINT websites! More than 10KB of sources! | /r/OSINT | 2023-04-11
  • 100ProjectsOfCode

    A list of practical knowledge-building projects.

    Project mention: Fired from an internship after 2 weeks | /r/cscareerquestions | 2023-06-02

    Work on a personal project. There's a list of 100 sample projects at https://github.com/arpit-omprakash/100ProjectsOfCode

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • soup

    Web Scraper in Go, similar to BeautifulSoup

  • lightnovel-crawler

    Generate and download e-books from online sources.

    Project mention: Help with Paperback IOS. | /r/mangapiracy | 2023-06-18

    Use Lightnovel crawler on a computer in terminal or in their discord bot to find series across multiple LN / webnovel sites then choose the format to download (epub,pdf, txt, and many more)

  • stealth

    :rocket: Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy

    Project mention: Ask HN: Most interesting tech you built for just yourself? | news.ycombinator.com | 2023-04-27

    Two years ago I decided to built my own web browser, with the underlying idea to use the internet more efficiently (and to force cache everything).

    Took a while to find the architecture, but it's still an unfinished ambitious project. You can probably spend forever working on HTML and CSS fixes alone...

    [1] https://github.com/tholian-network/stealth

  • Monkey-DL (Anime Downloader)

    Bulk download your favourite anime episodes from your favourite anime websites

  • spidr

    A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use. (by postmodern)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • web-scraping

    Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

    Project mention: web-scraping: NEW Data - star count:554.0 | /r/algoprojects | 2023-09-25
  • PHP Scraper

    A universal web-util for PHP.

  • google-maps-scraper

    scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place (by gosom)

    Project mention: Show HN: A Google Maps Scraper | news.ycombinator.com | 2023-12-03
  • basketball_reference_web_scraper

    NBA Stats API via Basketball Reference

  • summarizer

    A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

  • awesome-web-scraper

    A collection of awesome web scaper, crawler.

  • facebook_page_scraper

    Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV

  • cascadia

    Go cascadia package command line CSS selector

  • Senpwai

    A desktop app for tracking and batch downloading anime

    Project mention: Building W-9 Crafter | dev.to | 2024-03-28

    It's been a cool learning experience making a Product Hunt listing, a small demo video, and allll the social posts (Twitter, LinkedIn, etc).

  • get-sauce

    A command line program to download Hentai videos and images from multiple websites

    Project mention: How to run tar.gz ? | /r/linux4noobs | 2023-04-14

    So it's this https://github.com/gan-of-culture/get-sauce ?

  • public-roadmap

    Public roadmap for SerpApi, LLC (https://serpapi.com) (by serpapi)

    Project mention: AI Report #4: AutoGPT And Open-source lags behind Part 2 | news.ycombinator.com | 2023-06-15

    > The google search function is also limited. For comparison, SerpAPI masterfully scrapes Google Search using a proxy network and very intelligent parsing. In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.

    Thanks for your kind words. We are working on SerpApi integration for Auto-GPT: https://github.com/serpapi/public-roadmap/issues/905

  • CobWeb-lnx

    CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

    Project mention: Quem já contribuiu e quem já usou projectos open-source? | /r/devpt | 2023-06-30
  • yast

    Yet Another Streaming Tool

    Project mention: [OpenSource] I am building high performance Plex alternative in Go for Movies and TV Show | /r/golang | 2023-06-02

    I also build a similar tool, it let's you choose and play movies. I used webtorrent behind the scenes. https://github.com/qascade/yast

  • reddit-bots

    A collection of Reddit bots that I use to enhance the subreddits I manage.

  • tagalog-dictionary-scraper

    Builds a Tagalog dictionary by collecting Tagalog words from tagalog.pinoydictionary.com

  • mexican-jobs-2020

    Data ETL & Analysis on thousands of job listings from the official Mexican job board (2020 edition).

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-28.

web-scraper related posts

Index

What are some of the best open-source web-scraper projects? This list will help you:

Project Stars
1 awesome-crawler 6,023
2 100ProjectsOfCode 2,832
3 soup 2,125
4 lightnovel-crawler 1,258
5 stealth 986
6 Monkey-DL (Anime Downloader) 804
7 spidr 788
8 web-scraping 617
9 PHP Scraper 487
10 google-maps-scraper 469
11 basketball_reference_web_scraper 397
12 summarizer 267
13 awesome-web-scraper 231
14 facebook_page_scraper 183
15 cascadia 134
16 Senpwai 116
17 get-sauce 109
18 public-roadmap 43
19 CobWeb-lnx 38
20 yast 28
21 reddit-bots 23
22 tagalog-dictionary-scraper 22
23 mexican-jobs-2020 21
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com