playwright-python
ArchiveBox
playwright-python | ArchiveBox | |
---|---|---|
35 | 269 | |
13,613 | 24,882 | |
1.1% | 1.7% | |
9.2 | 9.8 | |
5 days ago | 4 months ago | |
Python | Python | |
Apache License 2.0 | MIT |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
playwright-python
-
How to scrape TikTok using Python
TikTok uses quite a lot of JavaScript on its site, both for displaying content and for analyzing user behavior, including detecting and blocking crawlers. Therefore, for crawling TikTok, we'll use a headless browser with Playwright.
- reviewing prelude's django starter template(by Sheena O'Connell)
-
Google and Anthropic are working on AI agents - so I made an open source alternative
Integrating Ollama, Microsoft vision models and Playwright I've made a simple agent that can browse websites and data to answer your query.
-
Ask HN: How to remove Ads from a downloaded HTML file to output an ad free file?
Do you have to use Curl? It wouldn't render a lot of sites correctly anyway (anything that uses JS for rendering).
Can you run a puppeteer/playwright instance and add an ad blocker to that? e.g. https://github.com/ghostery/adblocker or https://github.com/microsoft/playwright-python/issues/782
-
Scrape Google Flights with Python
Playwright
-
Login for web-scraping help
An alternative is to use a package like playwright (or Selenium) to run a browser remotely and login.
-
Show HN: Use cookies from Chrome (CDP) in cURL without copy pasting
Using the tools at hand is often the best approach. That said, I've spent most of the last 13 years of my career automating browsers. For years, I used Selenium with a variety of libraries. After switching to Puppeteer/Playwright, I have zero interest in going back lol. Playwright actually has first party Python support. (Puppeteer has a port called Pyppeteer, but it's no longer maintained and the author recommends using Playwright)
https://playwright.dev/python/
- Any extension to automate workflow in automatic1111?
- Can Requests be used to make a call to a js script? Need some guidance.
-
I can't find any good Python Selenium tutorials out there. Anyone got any good links to video tutorials or even dcoumentatniton?
This is pretty great for web automation https://playwright.dev/python/
ArchiveBox
-
Linkwarden: FOSS self-hostable bookmarking with AI-tagging and page archival
I've used https://historio.us since 2011 and still pay for it to keep access to all the pages I've archived over the years. The price has been kept low enough that I can't bring myself to cancel it even though I've been using self-hosted https://archivebox.io/ for the last few years.
I always include an archived link whenever I reference something in documentation. That's my main use at the moment.
However, I also feel like I've gotten a lot of really good value when trying to learn a new development topic. Whenever I find something that looks like it might be useful, I archive it and, because everything is searchable, I end up with a searchable index of really high quality content once I actually know what I'm doing.
I find it hard to rediscover content via web search these days and there's so much churn that having a personal archive of useful content is going to increase in value, at least in my opinion.
-
Links copied from project READMEs now add "?tab=readme-ov-file" query parameter
The links the reporter are trying to use already don't work on mobile. If you want to link to the README file, link to the README file, e.g. https://github.com/ArchiveBox/ArchiveBox/blob/dev/README.md
I'll concede that this latter link is much longer than it perhaps should be, but I don't think the links the reporter used previous should have ever been used as they don't work for a lot of people.
- Small Archives
-
Ask HN: How Do You Bookmark?
2. Drop the link into my instance of ArchiveBox [0] and will return to it a few weeks/months later or, more often than not, never again
[0] https://archivebox.io/
-
Is stuff online worth saving?
I use https://github.com/gildas-lormeau/SingleFile
I set it to tolerate longer processing times, and to open the file after saving so I can sanity check that it got everything. Works great at faithfully saving a page with images as it appears in browser, and saves so much time.
You might also have a look at https://github.com/ArchiveBox/ArchiveBox
-
Ask HN: How to remove Ads from a downloaded HTML file to output an ad free file?
After taking a break and stepping away for a bit, I realized that I was recreating an archiving system for websites and that there are existing solutions that do the same thing.
I found https://github.com/ArchiveBox/ArchiveBox/ which is a self hosted web archiving system. It covers most of my usecases (and I can extend it for additional functionality) so I am going to set this up and try it out.
Thanks all for the help.
-
Internet Archive breached again through stolen access tokens
Is anyone using ArchiveBox regularly? It's a self-hosted archiving solution. Not the ambitious decentralized system I think this comment is thinking of but a practical way for someone to run an archive for themselves. https://archivebox.io/
- ArchiveBox is evolving: the future of self-hosted internet archives
- Tell HN: The Wayback Machine is up, in read-only mode
- Web Archiving Projects
What are some alternatives?
Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
linkwarden - ⚡️⚡️⚡️ Self-hosted collaborative bookmark manager to collect, read, annotate, and fully preserve what matters, all in one place.
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)
SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file