phantomjs
ArchiveBox
Our great sponsors
phantomjs | ArchiveBox | |
---|---|---|
17 | 248 | |
29,279 | 19,790 | |
- | 3.4% | |
0.0 | 9.8 | |
over 1 year ago | 5 days ago | |
C++ | Python | |
BSD 3-clause "New" or "Revised" License | MIT |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
phantomjs
-
XZ: A Microcosm of the interactions in Open Source projects
The points you make aren't unreasonable.
It is necessary to establish clear boundaries of what can and can be provided by the maintainers. If not done at an earlier stage of the project, the support burden becomes too much to bear at which point the maintainer transfers ownership, and the project suffers from catastrophic consequences such as the xz backdoor we're talking about here, or other cases where the project mostly stalls and serves as an ego-boosting platform for the new maintainer, as was the case with PhantomJS[6].
This can also happen in your life, where a "friend" sees that you possess a certain skill, and then gradually tries to push an inordinate amount of their personal work related to this field onto you.
Personally, I think it's best to use an approach with extremely clear communication as to what the maintainer can and cannot provide. This can be seen, for example, in yt-dlp[1], where the consumer is clearly informed upfront that not providing detailed information as requested will lead them to block said consumer; or sqlite where their position regarding contributed patches[2] and support[3] is similarly made clear.
Having a shouty BDFL like Torvalds can also help improve code quality[4] and questionable contributions[5], though it is better that the shouty BDFL makes statements that are professional and do not show as much aggression; so for example, "Mauro, shut the fuck up"[7] would become "Mauro, your response is completely unbecoming for a Linux kernel maintainer, and is not in line with the promise of not breaking userspace."
[1] https://github.com/yt-dlp/yt-dlp/issues/new?assignees=&label...
[2] https://www.sqlite.org/copyright.html
[3] https://www.sqlite.org/support.html
[4] https://www.theregister.com/2024/01/29/linux_6_8_rc2/
[5] https://cse.umn.edu/cs/linux-incident
[6] https://github.com/ariya/phantomjs/issues/14541
[7] https://lkml.org/lkml/2012/12/23/75
-
Show HN: Generate a concatenated file of all CSS used on a given website
Last commit was in 2019, and it uses PhantomJS to query a page, which shutdown development in 2018
https://github.com/ariya/phantomjs/issues/15344
-
youtube bandwidth throttled for cloud addresses?
Install Phantomjs and see if that improves things.
-
How to Bypass Cloudflare in 2023: The 8 Best Methods
Automated Browser Detection. Cloudflare queries the browser for properties that only exist in automated web browser environments. For example, the existence of the window.document.__selenium_unwrapped or window.callPhantom property indicates the usage of Selenium and PhantomJS, respectively. For obvious reasons, you're getting blocked if this is detected.
-
Ask HN: What's the best way to get all the HTML from a JavaScript site?
I know there is https://phantomjs.org/ but is there something else people use these days?
The issue is some websites curl works fine to get all the rendered html, but some you don't get any content without a javascript engine.
-
Detecting PhantomJS headless browsers
Despite the popularity of Puppeteer and Headless Chrome, my team of threat researchers and I wondered, to what extent PhantomJS was still being used by bot developers. In this post, we share how we identified traffic associated with PhantomJS, the types of attacks performed, and its use in comparison to Puppeteer Extra Stealth.
-
How to make a SPA SEO crawlable?
I've been working on how to make a SPA crawlable by google based on google's instructions. Even though there are quite a few general explanations I couldn't find anywhere a more thorough step-by-step tutorial with actual examples. After having finished this I would like to share my solution so that others may also make use of it and possibly improve it further. I am using MVC with Webapi controllers, and Phantomjs on the server side, and Durandal on the client side with push-state enabled; I also use Breezejs for client-server data interaction, all of which I strongly recommend, but I'll try to give a general enough explanation that will also help people using other platforms.
-
Malware/Virus protection?
Regarding youtube-dl, I remember someone mentioning they needed an external helper program called phantomjs to download from some sites. I really wouldn't recommend using phantomjs as it hasn't been updated since 2018 and I see it has known vulnerabilities too.
-
Building A Serverless Screenshot Service with Lambda
For this project we will need some extra binaries ( PhantomJS in particular) to take the screenshots. We’ll also use ImageMagick, but that is provided by AWS by default in the Lambda image, so we don’t package it separately.
-
yt-dlp release 2022.04.08
ERROR: [iq.com] apvtge3eng: PhantomJS executable not found in PATH, download it from http://phantomjs.org
ArchiveBox
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
Two projects I greatly appreciate, allowing me to easily archive my bandcamp and GOG purchases (after the initial setup anyways):
https://github.com/easlice/bandcamp-downloader
https://github.com/Kalanyr/gogrepoc
And I recently learned about archivebox, which I think is going to be a fast favorite and finally let me clear out my mess of tabs/bookmarks: https://github.com/ArchiveBox/ArchiveBox
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network
-
Vice website is shutting down
If you really want to save the content for yourself, use something like https://archivebox.io/
I've been running a local instance for a few years now and download/save tech articles all time. I can search and find them as needed.
-
An Introduction to the WARC File
API is coming soon (relatively, it's still a one-man project)! Stay tuned https://github.com/ArchiveBox/ArchiveBox/issues/496
I have an event-sourcing refactor in progress now to allow us to pluginize functionality like the API (similar to Home Assistant with a plugin app sotre), it will take a month or two. Next up is the REST API using the new plugin system.
-
Ask HN: How can I back up an old vBulletin forum without admin access?
I guess your best chance is to use something like https://archivebox.io/.
-
ArchiveBox – open-source self-hosted web archiving
Yeah this is a cool project but it was discussed 2 days ago.
As mentioned by the maintainer there, they even maintain a list of alternatives, very classy:
https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-...
- ArchiveBox: Open-source self-hosted web archiving
- Linkhut: A Social Bookmarking Site
- Show HN: Rem: Remember Everything (open source)
- Bookmark manager with a focus on organization?
What are some alternatives?
puppeteer - Node.js API for Chrome
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
yt-dlp - A feature-rich command-line audio/video downloader
paimon-moe - Your best Genshin Impact companion! Help you plan what to farm with ascension calculator and database. Also track your progress with todo and wish counter.
Nightmare - A high-level browser automation library.
SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file
slimerjs - A scriptable browser like PhantomJS, based on Firefox
ArchivesSpace - The ArchivesSpace archives management tool
zombie - Insanely fast, full-stack, headless browser testing using node.js
grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.