shot-scraper
intellij-community
shot-scraper | intellij-community | |
---|---|---|
16 | 101 | |
1,535 | 16,588 | |
- | 0.6% | |
7.1 | 10.0 | |
about 1 month ago | 3 days ago | |
Python | ||
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
shot-scraper
-
I want to create IMDB for Open source projects
I had one of these recently! https://github.com/simonw/shot-scraper/pull/133/files
They're /incredibly/ rare though.
-
2024-03-01 listening in on the neighborhood
If anyone wants the raw data, it's available in window._Flourish_data variable on https://flo.uri.sh/visualisation/16818696/embed
Which means you can extract it with my https://shot-scraper.datasette.io/ tool like this:
shot-scraper javascript \
-
Web Scraping in Python – The Complete Guide
I strongly recommend adding Playwright to your set of tools for Python web scraping. It's by far the most powerful and best designed browser automation tool I've ever worked with.
I use it for my shot-scraper CLI tool: https://shot-scraper.datasette.io/ - which lets you scrape web pages directly from the command line by running JavaScript against pages to extract JSON data: https://shot-scraper.datasette.io/en/stable/javascript.html
- A command-line utility for taking automated screenshots of websites
-
Don’t Build a General Purpose API to Power Your Own Front End (2021)
This is exactly what the `Accept` HTTP header is for https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...
I think the author is generally correct that all JSON should be provided in a single request, but if you want to prove it, then you should be able to change your accept header to and from `application/json`/`text/html seeing nearly identical data.
In fact, this is what both GitLab and Github do. Try it out!
`curl -L https://github.com/simonw/shot-scraper` (text/html)
`curl --header "Accept: application/json" -L https://github.com/simonw/shot-scraper` (application/json)
-
Git scraping: track changes over time by scraping to a Git repository
Git is a key technology in this approach, because the value you get out of this form of scraping is the commit history - it's a way of turning a static source of information into a record of how that information changed over time.
I think it's fine to use the term "scraping" to refer to downloading a JSON file.
These days an increasing number of websites work by serving up JSON which is then turned into HTML by a client-side JavaScript app. The JSON often isn't a formally documented API, but you can grab it directly to avoid the extra step of processing the HTML.
I do run Git scrapers that process HTML as well. A couple of examples:
scrape-san-mateo-fire-dispatch https://github.com/simonw/scrape-san-mateo-fire-dispatch scrapes the HTML from http://www.firedispatch.com/iPhoneActiveIncident.asp?Agency=... and records both the original HTML and converted JSON in the repository.
scrape-hacker-news-by-domain https://github.com/simonw/scrape-hacker-news-by-domain uses my https://shot-scraper.datasette.io/ browser automation tool to convert an HTML page on Hacker News into JSON and save that to the repo. I wrote more about how that works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/
- Web Scraping via JavaScript Runtime Heap Snapshots (2022)
-
Need help with downloading a section of multiple sites as pdf files.
You can use shot-scraper: https://github.com/simonw/shot-scraper
-
Ask HN: Small scripts, hacks and automations you're proud of?
I have a neat Hacker News scraping setup that I'm really pleased with.
The problem: I want to know when content from one of my sites is submitted to Hacker News, and keep track of the points and comments over time. I also want to be alerted when it happens.
Solution: https://github.com/simonw/scrape-hacker-news-by-domain/
This repo does a LOT of things.
It's an implementation of my Git scraping pattern - https://simonwillison.net/2020/Oct/9/git-scraping/ - in that it runs a script once an hour to check for more content.
It scrapes https://news.ycombinator.com/from?site=simonwillison.net (scraping the HTML because this particular feature isn't supported by the Hacker News API) using shot-scraper - a tool I built for command-line browser automation: https://shot-scraper.datasette.io/
The scraper works by running this JavaScript against the page and recording the resulting JSON to the Git repository: https://github.com/simonw/scrape-hacker-news-by-domain/blob/...
That solves the "monitor and record any changes" bit.
But... I want alerts when my content shows up.
I solve that using three more tools I built: https://datasette.io/ and https://datasette.io/plugins/datasette-atom and https://datasette.cloud/
This script here runs to push the latest scraped JSON to my SQLite database hosted using my in-development SaaS platform, Datasette Cloud: https://github.com/simonw/scrape-hacker-news-by-domain/blob/...
I defined this SQL view https://simon.datasette.cloud/data/hacker_news_posts_atom which shows the latest data in the format required by the datasette-atom plugin.
Which means I can subscribe to the resulting Atom feed (add .atom to that URL) in NetNewsWire and get alerted when my content shows up on Hacker News!
I wrote a bit more about how this all works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/
-
Show HN: Plus – Self Updating Screenshots
Sounds a lot like Simon Willison's open source project shot-scraper
https://github.com/simonw/shot-scraper
intellij-community
-
Software Company HashiCorp Is Weighing a Potential Sale
Also, no BuSL stupidity, they're all Apache 2 AFAIK: https://github.com/JetBrains/intellij-community/blob/idea/23...
And the "all you can eat" toolbox license is just a staggeringly good deal, IMHO, which also comes with a "you can keep your license forever, just no updates" which is way different from setting subscription-based licensing money on fire when your license expires. Whoever came up with that should be applauded because it really drives down my "what about" anxiety of paying subscription money for IDEs
-
The Fossil Sync Protocol
I readily admit I am not familiar enough with fossil to know about the impedance mismatch, but I'll point out that https://github.com/JetBrains/intellij-plugins/tree/idea/241.... https://github.com/JetBrains/intellij-community/tree/idea/24... https://github.com/JetBrains/intellij-community/tree/idea/24... https://github.com/JetBrains/intellij-community/tree/idea/24... may a long way toward finding how they think about those operations
-
How to Develop an IntelliJ Plugin: A DIY Guide to Adding Drag and Drop with Custom DataFlavors
There is quite a bit going on in our view’s class, so we'll take it slow and go through its functions one by one, according to their importance. The first thing we need to do is to create the structure our items will fit into. com.intellij.ui.treeStructure.Tree seems to best match our needs, and that’s what we’ll use. In order to prepare it for what is coming, we need to configure it.
-
Operation K. Looking for bugs in the IntelliJ IDEA code
I think it's time to wrap it up. We've made a pull request to the IDEA developers, and I've accomplished the tasks I set out to do. I'm really happy to help the developers of my favorite IDE.
-
You are never taught how to build quality software
I offer, again, my JetBrains GrammarKit counterpoint from the last time that assertion came up <https://news.ycombinator.com/item?id=38192427>
>>>
I consider the JetBrains parsing system to be world class and they seem to hand-write very few (instead building on this system: https://github.com/JetBrains/Grammar-Kit#readme )
- https://github.com/JetBrains/intellij-community/blob/idea/23... (the parser I'll concede, as they do seem to be hand-rolling that part)
- https://github.com/JetBrains/intellij-community/blob/idea/23... (same for its parser)
- https://github.com/JetBrains/intellij-community/blob/idea/23... and https://github.com/JetBrains/intellij-community/blob/idea/23...
- https://github.com/JetBrains/intellij-plugins/blob/idea/233.... and https://github.com/JetBrains/intellij-plugins/blob/idea/233....
-
Just paying Figma $15/month because nothing else fucking works
I had the same experience with OmniGraffle, https://www.omnigroup.com/omnigraffle
It just worked. There was support. I wouldn't dig a hole in the ground with my bare hands, why wouldn't I use good tools. Of course I would like to use F/OSS for various reasons.
The model I absolutely love is Jetbrains, their core product is OSS, Apache licensed. The whole thing, totally usable. https://github.com/JetBrains/intellij-community
The money I send their way does both, it pays for developers and it puts an amazing artifact in the world that others can use and learn from. If they weren't open source, I wouldnt pay for it. I don't know how many others are the same as me, but Jetbrains really deserves credit here.
-
Show HN: Pg_yregress, Structured Testing for Postgres
# https://github.com/JetBrains/intellij-community/blob/idea/233.9802.14/json/src/jsonSchema/schema.json#L52
-
Java 21 makes me like Java again
and also FOSS (Apache 2): https://github.com/JetBrains/intellij-community (as well as PyCharm found in the "python" subdirectory)
- Predictive Debugging: A Game-Changing Look into the Future
- New Subreddit banner logo. Let me know if I need to fix something.
What are some alternatives?
gmail-sidebar-drive - A simple gmail add on to display all the drive folders and files in sidebar.
oh-my-posh - The most customisable and low-latency cross platform/shell prompt renderer
zettelkasten - Creating notes with the zettelkasten note taking method and storing all notes on github
pylance-release - Documentation and issues for Pylance
scrape-san-mateo-fire-dispatch
vscode-kotlin - Kotlin language support for VS Code
bbcrss - Scrapes the headlines from BBC News indexes every five minutes
kotlin-vim - Kotlin plugin for Vim. Featuring: syntax highlighting, basic indentation, Syntastic support
scrape-hacker-news-by-domain - Scrape HN to track links from specific domains
theia - Eclipse Theia is a cloud & desktop IDE framework implemented in TypeScript.
SeleniumBase - 📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
Apache NetBeans - Apache NetBeans