bbcrss VS shot-scraper

Compare bbcrss vs shot-scraper and see what are their differences.

bbcrss

Scrapes the headlines from BBC News indexes every five minutes (by jasoncartwright)
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
Nutrient - The #1 PDF SDK Library
Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
nutrient.io
featured
bbcrss shot-scraper
1 17
5 1,789
- 2.1%
10.0 7.3
about 1 year ago 5 months ago
XSLT Python
- Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

bbcrss

Posts with mentions or reviews of bbcrss. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-08-10.
  • Git scraping: track changes over time by scraping to a Git repository
    18 projects | news.ycombinator.com | 10 Aug 2023
    I've been promoting this idea for a few years now, and I've seen an increasing number of people put it into action.

    A fun way to track how people are using this is with the git-scraping topic on GitHub:

    https://github.com/topics/git-scraping?o=desc&s=updated

    That page orders repos tagged git-scraping by most-recently-updated, which shows which scrapers have run most recently.

    As I write this, just in the last minute repos that updated include:

    https://github.com/drzax/queensland-traffic-conditions

    https://github.com/jasoncartwright/bbcrss

    https://github.com/jackharrhy/metrobus-timetrack-history

    https://github.com/outages/bchydro-outages

shot-scraper

Posts with mentions or reviews of shot-scraper. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-07-09.
  • Docs as Code
    6 projects | news.ycombinator.com | 9 Jul 2024
    I actually built my own Playwright screenshotting software with this idea in mind too: https://shot-scraper.datasette.io/ - I wrote about using that for my project documentation here: https://simonwillison.net/2022/Oct/14/automating-screenshots...

    Really it comes down to the team you are working with. If you have user-facing documentation authors who are happy with Markdown and Git you can probably get this to work.

  • I want to create IMDB for Open source projects
    6 projects | news.ycombinator.com | 15 Apr 2024
    I had one of these recently! https://github.com/simonw/shot-scraper/pull/133/files

    They're /incredibly/ rare though.

  • 2024-03-01 listening in on the neighborhood
    5 projects | news.ycombinator.com | 2 Mar 2024
    If anyone wants the raw data, it's available in window._Flourish_data variable on https://flo.uri.sh/visualisation/16818696/embed

    Which means you can extract it with my https://shot-scraper.datasette.io/ tool like this:

        shot-scraper javascript \
  • Web Scraping in Python – The Complete Guide
    11 projects | news.ycombinator.com | 20 Feb 2024
    I strongly recommend adding Playwright to your set of tools for Python web scraping. It's by far the most powerful and best designed browser automation tool I've ever worked with.

    I use it for my shot-scraper CLI tool: https://shot-scraper.datasette.io/ - which lets you scrape web pages directly from the command line by running JavaScript against pages to extract JSON data: https://shot-scraper.datasette.io/en/stable/javascript.html

  • A command-line utility for taking automated screenshots of websites
    1 project | news.ycombinator.com | 15 Dec 2023
  • Don’t Build a General Purpose API to Power Your Own Front End (2021)
    3 projects | news.ycombinator.com | 20 Aug 2023
    This is exactly what the `Accept` HTTP header is for https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...

    I think the author is generally correct that all JSON should be provided in a single request, but if you want to prove it, then you should be able to change your accept header to and from `application/json`/`text/html seeing nearly identical data.

    In fact, this is what both GitLab and Github do. Try it out!

    `curl -L https://github.com/simonw/shot-scraper` (text/html)

    `curl --header "Accept: application/json" -L https://github.com/simonw/shot-scraper` (application/json)

  • Git scraping: track changes over time by scraping to a Git repository
    18 projects | news.ycombinator.com | 10 Aug 2023
    Git is a key technology in this approach, because the value you get out of this form of scraping is the commit history - it's a way of turning a static source of information into a record of how that information changed over time.

    I think it's fine to use the term "scraping" to refer to downloading a JSON file.

    These days an increasing number of websites work by serving up JSON which is then turned into HTML by a client-side JavaScript app. The JSON often isn't a formally documented API, but you can grab it directly to avoid the extra step of processing the HTML.

    I do run Git scrapers that process HTML as well. A couple of examples:

    scrape-san-mateo-fire-dispatch https://github.com/simonw/scrape-san-mateo-fire-dispatch scrapes the HTML from http://www.firedispatch.com/iPhoneActiveIncident.asp?Agency=... and records both the original HTML and converted JSON in the repository.

    scrape-hacker-news-by-domain https://github.com/simonw/scrape-hacker-news-by-domain uses my https://shot-scraper.datasette.io/ browser automation tool to convert an HTML page on Hacker News into JSON and save that to the repo. I wrote more about how that works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/

  • Web Scraping via JavaScript Runtime Heap Snapshots (2022)
    1 project | news.ycombinator.com | 8 Aug 2023
  • Need help with downloading a section of multiple sites as pdf files.
    2 projects | /r/webscraping | 25 Mar 2023
    You can use shot-scraper: https://github.com/simonw/shot-scraper
  • Ask HN: Small scripts, hacks and automations you're proud of?
    49 projects | news.ycombinator.com | 12 Mar 2023

What are some alternatives?

When comparing bbcrss and shot-scraper you can also consider the following projects:

gesetze-im-internet - Archive of German legal acts (weekly archive of gesetze-im-internet.de)

gmail-sidebar-drive - A simple gmail add on to display all the drive folders and files in sidebar.

hun_law_rs - Tool for parsing hungarian laws (Rust version)

zettelkasten - Creating notes with the zettelkasten note taking method and storing all notes on github

mastodon-scraping - Repository for scraping public information from Mastodon

scrape-san-mateo-fire-dispatch

queensland-traffic-conditions - A scraper that tracks changes to the published queensland traffic incidents data

fusionauth-site - Website and documentation for FusionAuth

gh-action-data-scraping - this shows how to use github actions to do periodic data scraping

scrape-hacker-news-by-domain - Scrape HN to track links from specific domains

torvenyek - Magyar törvények git repo

map-of-github - Inspirational Mapping

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
Nutrient - The #1 PDF SDK Library
Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
nutrient.io
featured