Internet-Places-Database vs ArchiveBox

Internet-Places-Database

Database of Internet places. Mostly domains (by rumca-js)

Source Code

renegat0x0.ddns.net

Suggest alternative

Edit details

ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more... (by ArchiveBox)

Source Code

archivebox.io

Docs

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Internet-Places-Database		ArchiveBox
	Project
11	Mentions	248
21	Stars	19,861
-	Growth	1.7%
9.3	Activity	9.8
2 days ago	Latest Commit	3 days ago
	Language	Python
GNU General Public License v3.0 only	License	MIT

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Internet-Places-Database

Posts with mentions or reviews of Internet-Places-Database. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-22.

Google Search results polluted by buggy AI-written code frustrate coders
1 project | news.ycombinator.com | 1 May 2024

I started gathering domains to see for myself the state of the Internet
https://github.com/rumca-js/Internet-Places-Database
I have many observations.
One is that I cannot see aby useful amiga links. I had to manually search them for some time. Some parts of the old internet exist, but are buried.
Second is that spam sites are everywhere. Not only AI generator.
Next is that personal sites exist, but they are often boring. Also 'CV sites' are a waste of time for me. I wonder how many of them are fake.
Many sites have poorly set up HTML meta fields, title, description. How anybody is supposed to find them?
I prefer going to passionate personal site about programming tips that reading content farms. It is difficult to find such sites.
Show HN: OpenOrb, a curated search engine for Atom and RSS feeds
7 projects | news.ycombinator.com | 22 Apr 2024

You can find many RSS feeds, links in my repository
https://github.com/rumca-js/Internet-Places-Database/tree/ma...
It contains also domain lists, that include tag indicating, if it is personal, or not.
We Need to Rewild the Internet
2 projects | news.ycombinator.com | 16 Apr 2024

I am running my personal web crawler since September of 2022. I gather internet domains and assign them meta information. There are various sources of my data. I assign "personal" tag to any personal website. I assign "self-host" tag to any self-host program I find.
I have less than 30k of personal websites.
Data are in the repository.
https://github.com/rumca-js/Internet-Places-Database
I still rely on google for many things, or kagi. It is interesting to me, what my crawler finds next. It is always a surprise to see new blog, or forgotten forum of sorts.
This is how I discover real new content on the Internet. Certainly not by google which can find only BBC, or techcrunch.
The internet is slipping out of our reach
1 project | news.ycombinator.com | 12 Mar 2024

Google will not be interested in fixing search. It also may not be possibile because of ai spam. They would like to invest in deep mind/bard/gemini than to fix technology that will be obsolete in a few years.
I have started scanning domains to see how many different places there are in the internet. Spoiler: Not many.
We could try to create curated open databases for links, forums, places, and links, but in ai era it will always be a niche.
Having said that I think that it is a good thing. If it is a niche it will not be spoiled by normal users expecting simple behavior, or corporations trying to control the output.
Start your blog
Start your curated lists of links.
Control your data. Share your data.
Link https://github.com/rumca-js/Internet-Places-Database
YaCy, a distributed Web Search Engine, based on a peer-to-peer network
9 projects | news.ycombinator.com | 5 Mar 2024

There are already many project about search:
- https://www.marginalia.nu/
- https://searchmysite.net/
- https://lucene.apache.org/
- elastic search
- https://presearch.com/
- https://stract.com/
- https://wiby.me/
I think that all project are fun. I would like to see one succeeding at reaching mainstream level of attention.
I have also been gathering links meta data for some time. Maybe I will use them to feed any eventual self hosted search engine, or language model, if I decide to experiment with that.
- domains for seed https://github.com/rumca-js/Internet-Places-Database
- bookmarks seed https://github.com/rumca-js/RSS-Link-Database
- links for year https://github.com/rumca-js/RSS-Link-Database-2024
A search engine in 80 lines of Python
6 projects | news.ycombinator.com | 7 Feb 2024

I have myself dabbled a little bit in that subject. Some of my notes:
- some RSS feeds are protected by cloudflare. It is true however that it is not necessary for majority of blogs. If you would like to do more then selenium would be a way to solve "cloudflare" protected links
- sometimes even selenium headless is not enough and full blown browser in selenium is necessary to fool it's protection
- sometimes even that is not enough
- then I started to wonder, why some RSS feeds are so well protected by cloudflare, but who am I to judge?
- sometimes it is beneficial to cover user agent. I feel bad for setting my user agent to chrome, but again, why RSS feeds are so well protected?
- you cannot parse, read entire Internet, therefore you always need to think about compromises. For example I have narrowed area of my searches in one of my projects to domains only. Now I can find most of the common domains, and I sort them by their "importance"
- RSS links do change. There need to be automated means to disable some feeds automatically to prevent checking inactive domains
- I do not see any configurable timeout for reading a page, but I am not familiar with aiohttp. Some pages might waste your time
- I hate that some RSS feeds are not configured properly. Some sites do not provide a valid meta "link" with "application/rss+xml". Some RSS feeds have naive titles like "Home", or no title at all. Such a waste of opportunity
My RSS feed parser, link archiver, web crawler: https://github.com/rumca-js/Django-link-archive. Especially interesting could be file rsshistory/webtools.py. It is not advanced programming craft, but it got the job done.
Additionally, in other project I have collected around 2378 of personal sites. I collect domains in https://github.com/rumca-js/Internet-Places-Database/tree/ma... . These files are JSONs. All personal sites have tag "personal".
Most of the things are collected from:
https://nownownow.com/
https://searchmysite.net/
I wanted also to process domains from https://downloads.marginalia.nu/, but haven't got time to read structure of the files
Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search [pdf]
6 projects | news.ycombinator.com | 16 Jan 2024

On the other hand it is not 1995. Time has moved on. I wrote a Simple RSS feed, that also serves as search engine for bookmarks.
I am able to run it in attick on raspberry pi. We do not have to rely so heavily on google.
https://github.com/rumca-js/Django-link-archive
It is true that it does not serve me as google, or kagi replacement. It is a very nice addition though.
With a little bit off determination I do not have to be so dependent on google.
Here is also a dump of known domains. Some are personal.
https://github.com/rumca-js/Internet-Places-Database
...and my bookmarks
https://github.com/rumca-js/RSS-Link-Database
Some more years, and google can go to hell.
Ask HN: What apps have you created for your own use?
212 projects | news.ycombinator.com | 12 Dec 2023

[4] https://github.com/rumca-js/Django-link-archive
These are exported then to github repositories:
[5] https://github.com/rumca-js/RSS-Link-Database - bookmarks
[6] https://github.com/rumca-js/RSS-Link-Database-2023 - 2023 year news headlines
[7] https://github.com/rumca-js/Internet-Places-Database - all known to me domains, and RSS feeds
The Small Website Discoverability Crisis
14 projects | news.ycombinator.com | 15 Nov 2023

My own repositories:
- bookmarked entries https://github.com/rumca-js/RSS-Link-Database
- mostly domains https://github.com/rumca-js/Internet-Places-Database
- all 'news' from 2023 https://github.com/rumca-js/RSS-Link-Database-2023
I am using my own Django program to capture and manage links https://github.com/rumca-js/Django-link-archive.
Show HN: List of Internet Domains
1 project | news.ycombinator.com | 30 Oct 2023

ArchiveBox

Posts with mentions or reviews of ArchiveBox. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-07.

Ask HN: What Underrated Open Source Project Deserves More Recognition?
63 projects | news.ycombinator.com | 7 Mar 2024

Two projects I greatly appreciate, allowing me to easily archive my bandcamp and GOG purchases (after the initial setup anyways):
https://github.com/easlice/bandcamp-downloader
https://github.com/Kalanyr/gogrepoc
And I recently learned about archivebox, which I think is going to be a fast favorite and finally let me clear out my mess of tabs/bookmarks: https://github.com/ArchiveBox/ArchiveBox
YaCy, a distributed Web Search Engine, based on a peer-to-peer network
9 projects | news.ycombinator.com | 5 Mar 2024
Vice website is shutting down
1 project | news.ycombinator.com | 23 Feb 2024

If you really want to save the content for yourself, use something like https://archivebox.io/
I've been running a local instance for a few years now and download/save tech articles all time. I can search and find them as needed.
An Introduction to the WARC File
7 projects | news.ycombinator.com | 29 Jan 2024

API is coming soon (relatively, it's still a one-man project)! Stay tuned https://github.com/ArchiveBox/ArchiveBox/issues/496
I have an event-sourcing refactor in progress now to allow us to pluginize functionality like the API (similar to Home Assistant with a plugin app sotre), it will take a month or two. Next up is the REST API using the new plugin system.
Ask HN: How can I back up an old vBulletin forum without admin access?
9 projects | news.ycombinator.com | 29 Jan 2024

I guess your best chance is to use something like https://archivebox.io/.
ArchiveBox – open-source self-hosted web archiving
2 projects | news.ycombinator.com | 13 Jan 2024

Yeah this is a cool project but it was discussed 2 days ago.
As mentioned by the maintainer there, they even maintain a list of alternatives, very classy:
https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-...
ArchiveBox: Open-source self-hosted web archiving
11 projects | news.ycombinator.com | 11 Jan 2024
Linkhut: A Social Bookmarking Site
5 projects | news.ycombinator.com | 9 Jan 2024
Show HN: Rem: Remember Everything (open source)
11 projects | news.ycombinator.com | 27 Dec 2023
Bookmark manager with a focus on organization?
6 projects | /r/selfhosted | 7 Dec 2023

What are some alternatives?

When comparing Internet-Places-Database and ArchiveBox you can also consider the following projects:

polychrome.nvim - A colorscheme creation micro-framework for Neovim

Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.

webring - Make yourself a website

paimon-moe - Your best Genshin Impact companion! Help you plan what to farm with ascension calculator and database. Also track your progress with todo and wish counter.

RSS-Link-Database - Bookmarked archived links

SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file

notifeed - Watch RSS/Atom feeds and send push notifications/webhooks when new content is detected

ArchivesSpace - The ArchivesSpace archives management tool

webpub - Give me a website, I'll make you an epub.

grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

clipzoomfx - Side-project for extracting highlights from (mostly sports) videos

Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.

Internet-Places-Database vs polychrome.nvim ArchiveBox vs Wallabag Internet-Places-Database vs webring ArchiveBox vs paimon-moe Internet-Places-Database vs RSS-Link-Database ArchiveBox vs SingleFile Internet-Places-Database vs notifeed ArchiveBox vs ArchivesSpace Internet-Places-Database vs webpub ArchiveBox vs grab-site Internet-Places-Database vs clipzoomfx ArchiveBox vs Archivematica

Compare Internet-Places-Database vs ArchiveBox and see what are their differences.

Internet-Places-Database

ArchiveBox

Internet-Places-Database

ArchiveBox

What are some alternatives?