topzemen vs Internet-Places-Database

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

topzemen		Internet-Places-Database
	Project
1	Mentions	18
3	Stars	29
-	Growth	-
5.0	Activity	9.3
8 months ago	Latest Commit	6 days ago
Python	Language
GNU General Public License v3.0 only	License	GNU General Public License v3.0 only

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

topzemen

Posts with mentions or reviews of topzemen. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-12.

Ask HN: What apps have you created for your own use?
212 projects | news.ycombinator.com | 12 Dec 2023

I created adult entertainment apps to organize and have fun with images and videos:
- RuGiVi: https://github.com/pronopython/rugivi - Browse your collection of images on an endless screen. Tested with more than 700.000 images at once!
- Fapel System: https://github.com/pronopython/fapel-system - Organize your adult images and videos by just using hardlinks and directories.
- TopZemen: https://github.com/pronopython/topzemen - Let the images float on your screen or rain down next to your browser window.
- Fplyr: https://github.com/pronopython/fplyr - An audio player to play moaning sounds in the background.
Everything for Ubuntu Linux and in parts also for Windows!

Internet-Places-Database

Posts with mentions or reviews of Internet-Places-Database. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-06-10.

Show HN: Crawl a modern website to a zip, serve the website from the zip
6 projects | news.ycombinator.com | 10 Jun 2024

I agree with your points.
You might be interested in reddit webscraping thread https://www.reddit.com/r/webscraping/
My passion project is https://github.com/rumca-js/Django-link-archive
Currently I use only one thread for scraping, I do not require more. It gets the job done. Also I know too little to play more with python "celery" threads.
My project can be used for various things. Depends on needs. Recently I am playing with using it as a 'search engine'. I am scraping the Internet to find cool stuff. Results are in https://github.com/rumca-js/Internet-Places-Database. No all domains are interesting though.
So many feed readers, so many behaviors
4 projects | news.ycombinator.com | 28 May 2024

https://github.com/rumca-js/Django-link-archive/blob/main/rs...
I know that there already are spiders, metadata processing packages for python, but I like having control over the process.
Old man yelling at the cloud. I hate also:
- blocking me with 403 because my user agent is not "mainstream". Why do I have to use chrome undetected to read some RSS feeds? Why can't I use third party clients? Contents can have adverts. I just want my own layout, buttons
- RSS feeds protected with cloudflare, so tools cannot read feeds easily
- not using, or outright blocking RSS functionality in wordpress. Some sites could be more open that way, but no. RSS feeds are closed/removed
- some sites have "/blog" location, but the main domain is empty, or nearly empty, or returns 404. Can I trust such location?
- when HTML meta data are not available. I like YouTube. It allows me to scrape metadata, but it protects video contents, and that is good
- weird redirects. Domain does not have any contents. Does not describe what it is. It just have javascript redirects. From main domain to some weird locations within the domain
- url shorteners, vanity links. You do not know where you will be transported. I understand they are counting sheep, but they sacrifice my security
- google returning links with syntax "https://www.google.com/url", not directly. Youtube does the same with syntax "https://www.youtube.com/redirect". For me again this is vulnerability
My ethic web scraper results are placed in: https://github.com/rumca-js/Internet-Places-Database.
Google just updated its algorithm. The Internet will never be the same
1 project | news.ycombinator.com | 25 May 2024

I have been collecting personal sites for some time. Format JSON files. Links have tag 'personal'.
https://github.com/rumca-js/Internet-Places-Database
Microsoft Bing Search, Copilot face downtime
2 projects | news.ycombinator.com | 23 May 2024

That is exactly why I scrape internet. I maintain all found domains in github:
https://github.com/rumca-js/Internet-Places-Database
I wanted to have all links about amiga, or commodore, chiptune.
It is not a search engine. For now, it is only data.
Maybe this will help somebody, or somebody will be able to use this data better.
I have a demo app running on rpi. It may be immediately broken if top many ppl accessed it.
https://renegat0x0.ddns.net/apps/places
38% of webpages that existed in 2013 are no longer accessible a decade later
5 projects | news.ycombinator.com | 18 May 2024

Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.
There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.
Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.
Our society produces content, not quality, not products.
SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.
Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".
Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.
Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database
I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).
The second thing is I cannot find anything "amiga" related.
This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.
Right now I am using less and less google, because I use more and more my bookmark manager.
https://github.com/rumca-js/Django-link-archive
My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.
In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.
A list of open source games
5 projects | news.ycombinator.com | 13 May 2024

Sorry for spamming, but I also create list.
In my repo maintain domains in JSON format. Tags "video game" and "open source" also provide list of open source games.
Other useful combination is tag "self-host", which provides self hostable programs, etc etc.
Link:
https://github.com/rumca-js/Internet-Places-Database
Some games are tagged "video game port" because they are reimplementation of existing games rather than providing something new.
Google Search results polluted by buggy AI-written code frustrate coders
1 project | news.ycombinator.com | 1 May 2024

I started gathering domains to see for myself the state of the Internet
https://github.com/rumca-js/Internet-Places-Database
I have many observations.
One is that I cannot see aby useful amiga links. I had to manually search them for some time. Some parts of the old internet exist, but are buried.
Second is that spam sites are everywhere. Not only AI generator.
Next is that personal sites exist, but they are often boring. Also 'CV sites' are a waste of time for me. I wonder how many of them are fake.
Many sites have poorly set up HTML meta fields, title, description. How anybody is supposed to find them?
I prefer going to passionate personal site about programming tips that reading content farms. It is difficult to find such sites.
Show HN: OpenOrb, a curated search engine for Atom and RSS feeds
7 projects | news.ycombinator.com | 22 Apr 2024

You can find many RSS feeds, links in my repository
https://github.com/rumca-js/Internet-Places-Database/tree/ma...
It contains also domain lists, that include tag indicating, if it is personal, or not.
We Need to Rewild the Internet
2 projects | news.ycombinator.com | 16 Apr 2024

I am running my personal web crawler since September of 2022. I gather internet domains and assign them meta information. There are various sources of my data. I assign "personal" tag to any personal website. I assign "self-host" tag to any self-host program I find.
I have less than 30k of personal websites.
Data are in the repository.
https://github.com/rumca-js/Internet-Places-Database
I still rely on google for many things, or kagi. It is interesting to me, what my crawler finds next. It is always a surprise to see new blog, or forgotten forum of sorts.
This is how I discover real new content on the Internet. Certainly not by google which can find only BBC, or techcrunch.
The internet is slipping out of our reach
1 project | news.ycombinator.com | 12 Mar 2024

Google will not be interested in fixing search. It also may not be possibile because of ai spam. They would like to invest in deep mind/bard/gemini than to fix technology that will be obsolete in a few years.
I have started scanning domains to see how many different places there are in the internet. Spoiler: Not many.
We could try to create curated open databases for links, forums, places, and links, but in ai era it will always be a niche.
Having said that I think that it is a good thing. If it is a niche it will not be spoiled by normal users expecting simple behavior, or corporations trying to control the output.
Start your blog
Start your curated lists of links.
Control your data. Share your data.
Link https://github.com/rumca-js/Internet-Places-Database

What are some alternatives?

When comparing topzemen and Internet-Places-Database you can also consider the following projects:

pq - a command-line Protobuf parser with Kafka support and JSON output

polychrome.nvim - A colorscheme creation micro-framework for Neovim

photo_id_resizer - Resize photo ID images using face recognition technology

webring - Make yourself a website

Clendar - Clendar - Minimal Calendar app. Written in SwiftUI.

RSS-Link-Database - Bookmarked archived links

dataplaneapi - HAProxy Data Plane API

notifeed - Watch RSS/Atom feeds and send push notifications/webhooks when new content is detected

whatsapp-web.js - A WhatsApp client library for NodeJS that connects through the WhatsApp Web browser app

webpub - Give me a website, I'll make you an epub.

rtpmidid - RTP MIDI (AppleMIDI) daemon for Linux

clipzoomfx - Side-project for extracting highlights from (mostly sports) videos

topzemen vs pq Internet-Places-Database vs polychrome.nvim topzemen vs photo_id_resizer Internet-Places-Database vs webring topzemen vs Clendar Internet-Places-Database vs RSS-Link-Database topzemen vs dataplaneapi Internet-Places-Database vs notifeed topzemen vs whatsapp-web.js Internet-Places-Database vs webpub topzemen vs rtpmidid Internet-Places-Database vs clipzoomfx

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

Compare topzemen vs Internet-Places-Database and see what are their differences.

topzemen

Internet-Places-Database

topzemen

Internet-Places-Database

What are some alternatives?