hackernews-personal-blogs vs Internet-Places-Database

hackernews-personal-blogs

List of Public Blogs of Hacker News users (by outcoldman)

Suggest topics

Source Code

Suggest alternative

Edit details

Internet-Places-Database

Database of Internet places. Mostly domains (by rumca-js)

Aggregator Archive link-aggregator

Source Code

renegat0x0.ddns.net

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

hackernews-personal-blogs		Internet-Places-Database
	Project
14	Mentions	18
331	Stars	29
-	Growth	-
5.8	Activity	9.3
3 months ago	Latest Commit	6 days ago
Go	Language
MIT License	License	GNU General Public License v3.0 only

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

hackernews-personal-blogs

Posts with mentions or reviews of hackernews-personal-blogs. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-22.

Show HN: OpenOrb, a curated search engine for Atom and RSS feeds
7 projects | news.ycombinator.com | 22 Apr 2024

Someone compiled this a while ago which is a pretty good starter list for content discovery: https://github.com/outcoldman/hackernews-personal-blogs
I've imported most of them into https://app.recessfeed.com/ and found some nice ones to follow through that
Show HN: Hacker News Blogroll
3 projects | news.ycombinator.com | 10 Apr 2024

I didn't know about your project and used https://blogs.hn
What's also helpful to get the feeds directly to your RSS Reader: https://github.com/outcoldman/hackernews-personal-blogs
However, I also needed to put some work into it and remove some blogs, because they're written in a foreign language or just not that interesting for me.
Ask HN: What blogs do you read?
2 projects | news.ycombinator.com | 30 Jan 2024

The list of HN users personal blogs is quite good:
https://github.com/outcoldman/hackernews-personal-blogs
Show HN: A better way to read blogs
3 projects | news.ycombinator.com | 6 Sep 2023

> I was looking at your HN OPML file on GitHub [0] and noticed that the `xmlURL` and `htmlURL` attributes are the same for each entry; the `htmlURL` currently points to the feed rather than the site. Do you happen to have the original HTML URLs available? Would be nice to have both. (Secondarily, I'm guessing some of the `type` attributes should probably be "atom" rather than "rss"?)
I'm just using the file that someone else made, but I guess they didn't really make the distinction between those URLs in the code, though it shouldn't be too hard to modify: https://github.com/outcoldman/hackernews-personal-blogs/blob...
It is also true that there are both RSS, Atom and possibly other feed types mixed in there. What I did for my site was to crawl through all of those feeds and process them one by one: get all of the posts, do some ordering and grouping and output everything as RSS feeds.
For example, here's the top 100 user feeds for 2023: https://hn-blogs.kronis.dev/feed-top100.xml
Those have HTML links for each of the posts, though I'm afraid it's not exactly what you're asking for (the HTML URLs for the sites/feeds themselves), because I don't actually store that anywhere in my case.
Ask HN: Could you share your personal social handlers (X,Mastodon,Threads) here?
1 project | news.ycombinator.com | 25 Aug 2023
Show HN: List (OPML) of Hacker News Users Personal Blogs
1 project | /r/patient_hackernews | 7 Jul 2023

1 project | /r/hackernews | 7 Jul 2023

1 project | /r/hypeurls | 6 Jul 2023

6 projects | news.ycombinator.com | 6 Jul 2023

Added a note https://github.com/outcoldman/hackernews-personal-blogs/tree...
I checked yours, the issue is that you are escaping + in application/rss+xml, when it should be just application/rss+xml
I have updated the code and will re-generate. I am not sure if what you are doing is allowed or not.

Internet-Places-Database

Posts with mentions or reviews of Internet-Places-Database. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-06-10.

Show HN: Crawl a modern website to a zip, serve the website from the zip
6 projects | news.ycombinator.com | 10 Jun 2024

I agree with your points.
You might be interested in reddit webscraping thread https://www.reddit.com/r/webscraping/
My passion project is https://github.com/rumca-js/Django-link-archive
Currently I use only one thread for scraping, I do not require more. It gets the job done. Also I know too little to play more with python "celery" threads.
My project can be used for various things. Depends on needs. Recently I am playing with using it as a 'search engine'. I am scraping the Internet to find cool stuff. Results are in https://github.com/rumca-js/Internet-Places-Database. No all domains are interesting though.
So many feed readers, so many behaviors
4 projects | news.ycombinator.com | 28 May 2024

https://github.com/rumca-js/Django-link-archive/blob/main/rs...
I know that there already are spiders, metadata processing packages for python, but I like having control over the process.
Old man yelling at the cloud. I hate also:
- blocking me with 403 because my user agent is not "mainstream". Why do I have to use chrome undetected to read some RSS feeds? Why can't I use third party clients? Contents can have adverts. I just want my own layout, buttons
- RSS feeds protected with cloudflare, so tools cannot read feeds easily
- not using, or outright blocking RSS functionality in wordpress. Some sites could be more open that way, but no. RSS feeds are closed/removed
- some sites have "/blog" location, but the main domain is empty, or nearly empty, or returns 404. Can I trust such location?
- when HTML meta data are not available. I like YouTube. It allows me to scrape metadata, but it protects video contents, and that is good
- weird redirects. Domain does not have any contents. Does not describe what it is. It just have javascript redirects. From main domain to some weird locations within the domain
- url shorteners, vanity links. You do not know where you will be transported. I understand they are counting sheep, but they sacrifice my security
- google returning links with syntax "https://www.google.com/url", not directly. Youtube does the same with syntax "https://www.youtube.com/redirect". For me again this is vulnerability
My ethic web scraper results are placed in: https://github.com/rumca-js/Internet-Places-Database.
Google just updated its algorithm. The Internet will never be the same
1 project | news.ycombinator.com | 25 May 2024

I have been collecting personal sites for some time. Format JSON files. Links have tag 'personal'.
https://github.com/rumca-js/Internet-Places-Database
Microsoft Bing Search, Copilot face downtime
2 projects | news.ycombinator.com | 23 May 2024

That is exactly why I scrape internet. I maintain all found domains in github:
https://github.com/rumca-js/Internet-Places-Database
I wanted to have all links about amiga, or commodore, chiptune.
It is not a search engine. For now, it is only data.
Maybe this will help somebody, or somebody will be able to use this data better.
I have a demo app running on rpi. It may be immediately broken if top many ppl accessed it.
https://renegat0x0.ddns.net/apps/places
38% of webpages that existed in 2013 are no longer accessible a decade later
5 projects | news.ycombinator.com | 18 May 2024

Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.
There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.
Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.
Our society produces content, not quality, not products.
SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.
Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".
Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.
Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database
I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).
The second thing is I cannot find anything "amiga" related.
This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.
Right now I am using less and less google, because I use more and more my bookmark manager.
https://github.com/rumca-js/Django-link-archive
My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.
In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.
A list of open source games
5 projects | news.ycombinator.com | 13 May 2024

Sorry for spamming, but I also create list.
In my repo maintain domains in JSON format. Tags "video game" and "open source" also provide list of open source games.
Other useful combination is tag "self-host", which provides self hostable programs, etc etc.
Link:
https://github.com/rumca-js/Internet-Places-Database
Some games are tagged "video game port" because they are reimplementation of existing games rather than providing something new.
Google Search results polluted by buggy AI-written code frustrate coders
1 project | news.ycombinator.com | 1 May 2024

I started gathering domains to see for myself the state of the Internet
https://github.com/rumca-js/Internet-Places-Database
I have many observations.
One is that I cannot see aby useful amiga links. I had to manually search them for some time. Some parts of the old internet exist, but are buried.
Second is that spam sites are everywhere. Not only AI generator.
Next is that personal sites exist, but they are often boring. Also 'CV sites' are a waste of time for me. I wonder how many of them are fake.
Many sites have poorly set up HTML meta fields, title, description. How anybody is supposed to find them?
I prefer going to passionate personal site about programming tips that reading content farms. It is difficult to find such sites.
Show HN: OpenOrb, a curated search engine for Atom and RSS feeds
7 projects | news.ycombinator.com | 22 Apr 2024

You can find many RSS feeds, links in my repository
https://github.com/rumca-js/Internet-Places-Database/tree/ma...
It contains also domain lists, that include tag indicating, if it is personal, or not.
We Need to Rewild the Internet
2 projects | news.ycombinator.com | 16 Apr 2024

I am running my personal web crawler since September of 2022. I gather internet domains and assign them meta information. There are various sources of my data. I assign "personal" tag to any personal website. I assign "self-host" tag to any self-host program I find.
I have less than 30k of personal websites.
Data are in the repository.
https://github.com/rumca-js/Internet-Places-Database
I still rely on google for many things, or kagi. It is interesting to me, what my crawler finds next. It is always a surprise to see new blog, or forgotten forum of sorts.
This is how I discover real new content on the Internet. Certainly not by google which can find only BBC, or techcrunch.
The internet is slipping out of our reach
1 project | news.ycombinator.com | 12 Mar 2024

Google will not be interested in fixing search. It also may not be possibile because of ai spam. They would like to invest in deep mind/bard/gemini than to fix technology that will be obsolete in a few years.
I have started scanning domains to see how many different places there are in the internet. Spoiler: Not many.
We could try to create curated open databases for links, forums, places, and links, but in ai era it will always be a niche.
Having said that I think that it is a good thing. If it is a niche it will not be spoiled by normal users expecting simple behavior, or corporations trying to control the output.
Start your blog
Start your curated lists of links.
Control your data. Share your data.
Link https://github.com/rumca-js/Internet-Places-Database

What are some alternatives?

When comparing hackernews-personal-blogs and Internet-Places-Database you can also consider the following projects:

Node RED - Low-code programming for event-driven applications

polychrome.nvim - A colorscheme creation micro-framework for Neovim

Miniflux - Minimalist and opinionated feed reader

webring - Make yourself a website

recess - A content aggregator for keeping up and interacting with siloed content.

RSS-Link-Database - Bookmarked archived links

notifeed - Watch RSS/Atom feeds and send push notifications/webhooks when new content is detected

webpub - Give me a website, I'll make you an epub.

clipzoomfx - Side-project for extracting highlights from (mostly sports) videos

catwiki_p3 - CatWiki (using Python 3)

sunburn.nvim - A Neovim colorscheme emphasizing readability above all else.

oatmeal - Terminal UI to chat with large language models (LLM) using different model backends, and integrations with your favourite editors!

hackernews-personal-blogs vs Node RED Internet-Places-Database vs polychrome.nvim hackernews-personal-blogs vs Miniflux Internet-Places-Database vs webring hackernews-personal-blogs vs recess Internet-Places-Database vs RSS-Link-Database Internet-Places-Database vs notifeed Internet-Places-Database vs webpub Internet-Places-Database vs clipzoomfx Internet-Places-Database vs catwiki_p3 Internet-Places-Database vs sunburn.nvim Internet-Places-Database vs oatmeal

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Compare hackernews-personal-blogs vs Internet-Places-Database and see what are their differences.

hackernews-personal-blogs

Internet-Places-Database

hackernews-personal-blogs

Internet-Places-Database

What are some alternatives?