SaaSHub helps you find the best software and product alternatives Learn more →
Internet-Places-Database Alternatives
Similar projects and alternatives to Internet-Places-Database
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
-
-
chatgpt-shell
A multi-llm Emacs shell (ChatGPT, Claude, Gemini, Kagi, Ollama, Perplexity) + editing integrations
-
-
-
motion
Motion, a software motion detector. Home page: https://motion-project.github.io/ (by Motion-Project)
-
-
-
vod2pod-rss
Vod2Pod-RSS converts a YouTube or Twitch channel into a podcast with ease. It creates a podcast RSS that can be listened to directly inside any podcast client. VODs are transcoded to MP3 on the fly and no server storage is needed!
-
-
srgn
A grep-like tool which understands source code syntax and allows for manipulation in addition to search
-
-
soundfingerprinting
Open source audio fingerprinting in .NET. An efficient algorithm for acoustic fingerprinting written purely in C#.
-
-
-
-
kindle_clippings_webapp
Web Application for importing, viewing and tagging kindle clippings. Account is not required.
-
Internet-Places-Database discussion
Internet-Places-Database reviews and mentions
-
Django bookmark management software
Internet Places Database
-
Public Suffix List
We may not know how certain initiatives are important. This is used for example inhttps://pypi.org/project/tldextract/, which also accidentally is used by my project.
As another side note I capture not only tlds, but also domains in https://github.com/rumca-js/Internet-Places-Database
-
Full Text, Full Archive RSS Feeds for Any Blog
Similar goal, different approach. I wrote RSS reader, that captures link meta from various RSS sources. The meta data are exported every day. I have different repositories for bookmarks, different for daily links, different for 'known domains'.
Written in Django.
I can always go back, parse saved data. If web page is not available, I fall back to Internet Archive.
- https://github.com/rumca-js/Django-link-archive - RSS reader / web scraper
- https://github.com/rumca-js/RSS-Link-Database - bookmarks I found interesting
- https://github.com/rumca-js/RSS-Link-Database-2024 - every day storage
- https://github.com/rumca-js/Internet-Places-Database - internet domains found on the internet
After creating python package for web communication, that replaces requests for me, which uses sometimes selenium I wrote also CLI interface to read RSS sources from commandline: https://github.com/rumca-js/yafr
-
Does Sundar Pichai/Search team know how bad Google search is?
I think that not only search has gotten worse. The Internet has also. It is full of spam, full of grifters, full of farms. Dead Internet has come true.
In pile of garbage it is hard to find interesting stuff.
That is why we have HN, reddits. We are trying to find interesting stuff using collective effort. In some cases, this collective effort also is being monetized, so people are disenchanted with such solutions.
You can create your own reddit clones, but it will not work our, because you do not have the user base / count.
I tried collecting domain names at least: https://github.com/rumca-js/Internet-Places-Database
In my system I can use the data to find domains, so that when I search for github I find it, when I search for youtube I do not find a ton of minecraft videos.
-
WTF Happened to Blogs
Personal sites are tagged as 'personal' in this repository.
https://github.com/rumca-js/Internet-Places-Database
The question is how do you find interesting blog in 4000 of blogs?
Some blogs are not tagged, but they are also easily found by searching 'personal site' or 'personal website' or 'personal blog'.
The blogs exist, but google has no monetary incentive to show them to you, when it can show you content farms with ads.
-
Show HN: Crawl a modern website to a zip, serve the website from the zip
I agree with your points.
You might be interested in reddit webscraping thread https://www.reddit.com/r/webscraping/
My passion project is https://github.com/rumca-js/Django-link-archive
Currently I use only one thread for scraping, I do not require more. It gets the job done. Also I know too little to play more with python "celery" threads.
My project can be used for various things. Depends on needs. Recently I am playing with using it as a 'search engine'. I am scraping the Internet to find cool stuff. Results are in https://github.com/rumca-js/Internet-Places-Database. No all domains are interesting though.
-
So many feed readers, so many behaviors
https://github.com/rumca-js/Django-link-archive/blob/main/rs...
I know that there already are spiders, metadata processing packages for python, but I like having control over the process.
Old man yelling at the cloud. I hate also:
- blocking me with 403 because my user agent is not "mainstream". Why do I have to use chrome undetected to read some RSS feeds? Why can't I use third party clients? Contents can have adverts. I just want my own layout, buttons
- RSS feeds protected with cloudflare, so tools cannot read feeds easily
- not using, or outright blocking RSS functionality in wordpress. Some sites could be more open that way, but no. RSS feeds are closed/removed
- some sites have "/blog" location, but the main domain is empty, or nearly empty, or returns 404. Can I trust such location?
- when HTML meta data are not available. I like YouTube. It allows me to scrape metadata, but it protects video contents, and that is good
- weird redirects. Domain does not have any contents. Does not describe what it is. It just have javascript redirects. From main domain to some weird locations within the domain
- url shorteners, vanity links. You do not know where you will be transported. I understand they are counting sheep, but they sacrifice my security
- google returning links with syntax "https://www.google.com/url", not directly. Youtube does the same with syntax "https://www.youtube.com/redirect". For me again this is vulnerability
My ethic web scraper results are placed in: https://github.com/rumca-js/Internet-Places-Database.
-
Google just updated its algorithm. The Internet will never be the same
I have been collecting personal sites for some time. Format JSON files. Links have tag 'personal'.
https://github.com/rumca-js/Internet-Places-Database
-
Microsoft Bing Search, Copilot face downtime
That is exactly why I scrape internet. I maintain all found domains in github:
https://github.com/rumca-js/Internet-Places-Database
I wanted to have all links about amiga, or commodore, chiptune.
It is not a search engine. For now, it is only data.
Maybe this will help somebody, or somebody will be able to use this data better.
I have a demo app running on rpi. It may be immediately broken if top many ppl accessed it.
https://renegat0x0.ddns.net/apps/places
- 38% of webpages that existed in 2013 are no longer accessible a decade later
-
A note from our sponsor - SaaSHub
www.saashub.com | 15 Jan 2025
Stats
rumca-js/Internet-Places-Database is an open source project licensed under GNU General Public License v3.0 only which is an OSI approved license.
Popular Comparisons
- Internet-Places-Database VS vod2pod-rss
- Internet-Places-Database VS RSS-Link-Database
- Internet-Places-Database VS full-text-tabs-forever
- Internet-Places-Database VS polychrome.nvim
- Internet-Places-Database VS clipzoomfx
- Internet-Places-Database VS webring
- Internet-Places-Database VS notifeed
- Internet-Places-Database VS webpub
- Internet-Places-Database VS oatmeal
- Internet-Places-Database VS sunburn.nvim