datasette-app vs shot-scraper

datasette-app

The Datasette macOS application (by simonw)

datasette

Source Code

datasette.io

Suggest alternative

Edit details

shot-scraper

A command-line utility for taking automated screenshots of websites (by simonw)

Playwright playwright-python Scraping screenshot-utility Screenshots

Source Code

shot-scraper.datasette.io

Suggest alternative

Edit details

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

surveyjs.io

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

datasette-app		shot-scraper
	Project
12	Mentions	16
115	Stars	1,531
-	Growth	-
2.6	Activity	7.1
about 1 year ago	Latest Commit	about 1 month ago
JavaScript	Language	Python
-	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

datasette-app

Posts with mentions or reviews of datasette-app. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-08-20.

Welcome to Datasette Cloud
6 projects | news.ycombinator.com | 20 Aug 2023

Hah, Softbank isn't the goal here!
I realized that Datasette is the first project of my entire career where if I was still working on it in 15 years time I wouldn't feel bored yet. There's just SO MUCH scope for interesting applications of the core idea.
As such, I want to work on it for decades. But it's lonely working on it alone (the community around it has been growing and is delightful, but it's not the same as having a full-time team.)
So the question I'm trying to answer is how to make the project financially sustainable in the long-run - not just for myself, but so I can pay for a team to work on it with me.
There are plenty of other examples of open source projects that have turned SaaS hosting into a sustainable business model - WordPress and GitLab are just two of the best examples. It feels like it's a reasonably well-trodden path.
Plus... I want people to be able to use my software. Currently to use Datasette as an individual you either have to "pip" or "brew" install it, or you can try the macOS Electron app - https://datasette.io/desktop - but I want newsrooms to be able to use it to collaborate on data. And most newsrooms aren't well equipped to configure a Linux server.
So I realized that a hosted SaaS version can solve two issues at once: it can help the audience I care about actually benefit from the value of the software so far, and it provides a reasonably realistic path to financial sustainability for the project as a whole.
And yeah, I'd also like to make a ton of money out of it myself too!
Bing: “I will not harm you unless you harm me first”
2 projects | news.ycombinator.com | 15 Feb 2023

It would be nice if his stuff worked better, ironically. The Datasette app for Mac seems to be constantly stuck on loading (yes I have 0.2.2):
https://github.com/simonw/datasette-app/issues/139
Amd his screen capture library can't capture Canvas renderings:
https://simonwillison.net/2022/Mar/10/shot-scraper/
Lost two days at work on that.
Speaking of technology not working as expected.
Datasette is my data hammer
6 projects | news.ycombinator.com | 18 Jan 2023

I'd love to get the desktop app working on Linux and Windows.
I did manage to get a prototype working on Windows, despite having VERY little experience working on that platform: https://github.com/simonw/datasette-app/issues/71
The bit I'm stuck on is how to turn that prototype into an application with an installer that's signed so people can download and run it.
Automating screenshots for the Datasette documentation using shot-scraper
7 projects | news.ycombinator.com | 15 Oct 2022

I have trouble answering this question myself, and I created it!
The problem I have is that it can be applied to too many different problems.
I personally have used it for the following (a truncated summary):
- Publishing data online to allow other people to explore it, for example https://scotrail.datasette.io and https://russian-ira-facebook-ads.datasettes.com/
- Building websites, by combining it with custom templates. https://datasette.io and https://www.niche-museums.com and https://til.simonwillison.net are three examples
- Building my own combined search engine over a bunch of different data. https://github-to-sqlite.dogsheep.net is this for my GitHub issues and commits and issue comments across 100+ projects
- Similarly, building a code search engine across multiple repos (partly to demonstrate how far you can go with custom plugins): https://ripgrep.datasette.io
- Any time I have a CSV file I open it in the Datasette Desktop macOS app first to start exploring it: https://datasette.io/desktop
- As a prototyping tool. It's the fastest way I know of to get from some data files (CSV or JSON) to a working JSON API - and a GraphQL API too using this plugin: https://datasette.io/plugins/datasette-graphql
- Messing around with geospatial data - here's a write-up of my favourite experiment with that so far: https://simonwillison.net/2021/Jan/24/drawing-shapes-spatial...
This is a bewilderingly wide array of things! And I keep on finding new problems I can apply it to:
Of course, if all you have is a hammer, everything looks like a nail. But thanks to the plugin system (and the amazing flexibility of SQLite under the good) I can reshape my hammer into all sorts of interesting shapes!
I've been trying to capture some of this at https://datasette.io/for
This is one of my biggest marketing challenges for the project though. If someone asks you for an elevator pitch you need to do better than spending 15 minutes talking through a wide ranging bulleted list!
Upscayl – Free and Open Source AI Image Upscaler for Linux, macOS and Windows
10 projects | news.ycombinator.com | 28 Aug 2022
What’s the best cheap program to start??
3 projects | /r/learnSQL | 20 Apr 2022

You can use my Datasette software to explore the database: https://datasette.io/desktop - that's the Mac version but you can run the underlying software on Windows too.
Cool SQL projects?
1 project | /r/SQL | 18 Apr 2022

Then you can either run "pip install datasette" and "datasette healthkit.db" or you can install the Datasette Desktop app from https://datasette.io/desktop and use that to open the database file.
Need helping actually using SQL
2 projects | /r/SQL | 6 Apr 2022

You may find my Datasette Desktop Mac application useful: it provides a read-only interface over SQLite and cdn oprn both SQLite files and CSV files: https://datasette.io/desktop
JupyterLab Desktop App now available
11 projects | news.ycombinator.com | 22 Sep 2021

This is really interesting to see. I've been trying to solve a similar problem over the past few weeks - bundling up a Python web application as an installable Desktop app, in my case for https://datasette.io/desktop - so it's really interesting to see how they've approached the problem.
I ended up including a full copy of Python using https://github.com/indygreg/python-build-standalone - it looks like they've bundled Conda.
I wrote up detailed notes on how I solved the Python bundling problem in https://simonwillison.net/2021/Sep/8/datasette-desktop/#how-... and in https://til.simonwillison.net/electron/python-inside-electro...
Datasette Desktop 0.2.0: The annotated release notes
1 project | /r/Python | 14 Sep 2021

I've been having a ton of fun building this. The code is all open source at https://github.com/simonw/datasette-app - it's my first time working with Electron and the biggest task was figuring out how to bundle Python inside an Electron app, which I wrote about in detail here: https://til.simonwillison.net/electron/python-inside-electron

shot-scraper

Posts with mentions or reviews of shot-scraper. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-15.

I want to create IMDB for Open source projects
6 projects | news.ycombinator.com | 15 Apr 2024

I had one of these recently! https://github.com/simonw/shot-scraper/pull/133/files
They're /incredibly/ rare though.
2024-03-01 listening in on the neighborhood
5 projects | news.ycombinator.com | 2 Mar 2024
If anyone wants the raw data, it's available in window._Flourish_data variable on https://flo.uri.sh/visualisation/16818696/embed
Which means you can extract it with my https://shot-scraper.datasette.io/ tool like this:
```
    shot-scraper javascript \
```
Web Scraping in Python – The Complete Guide
11 projects | news.ycombinator.com | 20 Feb 2024

I strongly recommend adding Playwright to your set of tools for Python web scraping. It's by far the most powerful and best designed browser automation tool I've ever worked with.
I use it for my shot-scraper CLI tool: https://shot-scraper.datasette.io/ - which lets you scrape web pages directly from the command line by running JavaScript against pages to extract JSON data: https://shot-scraper.datasette.io/en/stable/javascript.html
A command-line utility for taking automated screenshots of websites
1 project | news.ycombinator.com | 15 Dec 2023
Don’t Build a General Purpose API to Power Your Own Front End (2021)
3 projects | news.ycombinator.com | 20 Aug 2023

This is exactly what the `Accept` HTTP header is for https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...
I think the author is generally correct that all JSON should be provided in a single request, but if you want to prove it, then you should be able to change your accept header to and from `application/json`/`text/html seeing nearly identical data.
In fact, this is what both GitLab and Github do. Try it out!
`curl -L https://github.com/simonw/shot-scraper` (text/html)
`curl --header "Accept: application/json" -L https://github.com/simonw/shot-scraper` (application/json)
Git scraping: track changes over time by scraping to a Git repository
18 projects | news.ycombinator.com | 10 Aug 2023

Git is a key technology in this approach, because the value you get out of this form of scraping is the commit history - it's a way of turning a static source of information into a record of how that information changed over time.
I think it's fine to use the term "scraping" to refer to downloading a JSON file.
These days an increasing number of websites work by serving up JSON which is then turned into HTML by a client-side JavaScript app. The JSON often isn't a formally documented API, but you can grab it directly to avoid the extra step of processing the HTML.
I do run Git scrapers that process HTML as well. A couple of examples:
scrape-san-mateo-fire-dispatch https://github.com/simonw/scrape-san-mateo-fire-dispatch scrapes the HTML from http://www.firedispatch.com/iPhoneActiveIncident.asp?Agency=... and records both the original HTML and converted JSON in the repository.
scrape-hacker-news-by-domain https://github.com/simonw/scrape-hacker-news-by-domain uses my https://shot-scraper.datasette.io/ browser automation tool to convert an HTML page on Hacker News into JSON and save that to the repo. I wrote more about how that works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/
Web Scraping via JavaScript Runtime Heap Snapshots (2022)
1 project | news.ycombinator.com | 8 Aug 2023
Need help with downloading a section of multiple sites as pdf files.
2 projects | /r/webscraping | 25 Mar 2023

You can use shot-scraper: https://github.com/simonw/shot-scraper
Ask HN: Small scripts, hacks and automations you're proud of?
49 projects | news.ycombinator.com | 12 Mar 2023

I have a neat Hacker News scraping setup that I'm really pleased with.
The problem: I want to know when content from one of my sites is submitted to Hacker News, and keep track of the points and comments over time. I also want to be alerted when it happens.
Solution: https://github.com/simonw/scrape-hacker-news-by-domain/
This repo does a LOT of things.
It's an implementation of my Git scraping pattern - https://simonwillison.net/2020/Oct/9/git-scraping/ - in that it runs a script once an hour to check for more content.
It scrapes https://news.ycombinator.com/from?site=simonwillison.net (scraping the HTML because this particular feature isn't supported by the Hacker News API) using shot-scraper - a tool I built for command-line browser automation: https://shot-scraper.datasette.io/
The scraper works by running this JavaScript against the page and recording the resulting JSON to the Git repository: https://github.com/simonw/scrape-hacker-news-by-domain/blob/...
That solves the "monitor and record any changes" bit.
But... I want alerts when my content shows up.
I solve that using three more tools I built: https://datasette.io/ and https://datasette.io/plugins/datasette-atom and https://datasette.cloud/
This script here runs to push the latest scraped JSON to my SQLite database hosted using my in-development SaaS platform, Datasette Cloud: https://github.com/simonw/scrape-hacker-news-by-domain/blob/...
I defined this SQL view https://simon.datasette.cloud/data/hacker_news_posts_atom which shows the latest data in the format required by the datasette-atom plugin.
Which means I can subscribe to the resulting Atom feed (add .atom to that URL) in NetNewsWire and get alerted when my content shows up on Hacker News!
I wrote a bit more about how this all works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/
Show HN: Plus – Self Updating Screenshots
3 projects | news.ycombinator.com | 17 Jan 2023

Sounds a lot like Simon Willison's open source project shot-scraper
https://github.com/simonw/shot-scraper

What are some alternatives?

When comparing datasette-app and shot-scraper you can also consider the following projects:

til - Today I Learned

gmail-sidebar-drive - A simple gmail add on to display all the drive folders and files in sidebar.

fusionauth-site - Website and documentation for FusionAuth

zettelkasten - Creating notes with the zettelkasten note taking method and storing all notes on github

iron.nvim - Interactive Repl Over Neovim

scrape-san-mateo-fire-dispatch

vscode-nodebook - Node.js notebook

bbcrss - Scrapes the headlines from BBC News indexes every five minutes

vscode-jupyter - VS Code Jupyter extension

scrape-hacker-news-by-domain - Scrape HN to track links from specific domains

django-sql-dashboard - Django app for building dashboards using raw SQL queries

SeleniumBase - 📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.