hackernews-personal-blogs VS Internet-Places-Database

Compare hackernews-personal-blogs vs Internet-Places-Database and see what are their differences.

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
hackernews-personal-blogs Internet-Places-Database
14 18
331 29
- -
5.8 9.3
3 months ago 6 days ago
Go
MIT License GNU General Public License v3.0 only
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

hackernews-personal-blogs

Posts with mentions or reviews of hackernews-personal-blogs. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-22.
  • Show HN: OpenOrb, a curated search engine for Atom and RSS feeds
    7 projects | news.ycombinator.com | 22 Apr 2024
    Someone compiled this a while ago which is a pretty good starter list for content discovery: https://github.com/outcoldman/hackernews-personal-blogs

    I've imported most of them into https://app.recessfeed.com/ and found some nice ones to follow through that

  • Show HN: Hacker News Blogroll
    3 projects | news.ycombinator.com | 10 Apr 2024
    I didn't know about your project and used https://blogs.hn

    What's also helpful to get the feeds directly to your RSS Reader: https://github.com/outcoldman/hackernews-personal-blogs

    However, I also needed to put some work into it and remove some blogs, because they're written in a foreign language or just not that interesting for me.

  • Ask HN: What blogs do you read?
    2 projects | news.ycombinator.com | 30 Jan 2024
    The list of HN users personal blogs is quite good:

    https://github.com/outcoldman/hackernews-personal-blogs

  • Show HN: A better way to read blogs
    3 projects | news.ycombinator.com | 6 Sep 2023
    > I was looking at your HN OPML file on GitHub [0] and noticed that the `xmlURL` and `htmlURL` attributes are the same for each entry; the `htmlURL` currently points to the feed rather than the site. Do you happen to have the original HTML URLs available? Would be nice to have both. (Secondarily, I'm guessing some of the `type` attributes should probably be "atom" rather than "rss"?)

    I'm just using the file that someone else made, but I guess they didn't really make the distinction between those URLs in the code, though it shouldn't be too hard to modify: https://github.com/outcoldman/hackernews-personal-blogs/blob...

    It is also true that there are both RSS, Atom and possibly other feed types mixed in there. What I did for my site was to crawl through all of those feeds and process them one by one: get all of the posts, do some ordering and grouping and output everything as RSS feeds.

    For example, here's the top 100 user feeds for 2023: https://hn-blogs.kronis.dev/feed-top100.xml

    Those have HTML links for each of the posts, though I'm afraid it's not exactly what you're asking for (the HTML URLs for the sites/feeds themselves), because I don't actually store that anywhere in my case.

  • Ask HN: Could you share your personal social handlers (X,Mastodon,Threads) here?
    1 project | news.ycombinator.com | 25 Aug 2023
  • Show HN: List (OPML) of Hacker News Users Personal Blogs
    1 project | /r/patient_hackernews | 7 Jul 2023
    1 project | /r/hackernews | 7 Jul 2023
    1 project | /r/hypeurls | 6 Jul 2023
    6 projects | news.ycombinator.com | 6 Jul 2023
    Added a note https://github.com/outcoldman/hackernews-personal-blogs/tree...

    I checked yours, the issue is that you are escaping + in application/rss+xml, when it should be just application/rss+xml

    I have updated the code and will re-generate. I am not sure if what you are doing is allowed or not.

Internet-Places-Database

Posts with mentions or reviews of Internet-Places-Database. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-06-10.
  • Show HN: Crawl a modern website to a zip, serve the website from the zip
    6 projects | news.ycombinator.com | 10 Jun 2024
    I agree with your points.

    You might be interested in reddit webscraping thread https://www.reddit.com/r/webscraping/

    My passion project is https://github.com/rumca-js/Django-link-archive

    Currently I use only one thread for scraping, I do not require more. It gets the job done. Also I know too little to play more with python "celery" threads.

    My project can be used for various things. Depends on needs. Recently I am playing with using it as a 'search engine'. I am scraping the Internet to find cool stuff. Results are in https://github.com/rumca-js/Internet-Places-Database. No all domains are interesting though.

  • So many feed readers, so many behaviors
    4 projects | news.ycombinator.com | 28 May 2024
    https://github.com/rumca-js/Django-link-archive/blob/main/rs...

    I know that there already are spiders, metadata processing packages for python, but I like having control over the process.

    Old man yelling at the cloud. I hate also:

    - blocking me with 403 because my user agent is not "mainstream". Why do I have to use chrome undetected to read some RSS feeds? Why can't I use third party clients? Contents can have adverts. I just want my own layout, buttons

    - RSS feeds protected with cloudflare, so tools cannot read feeds easily

    - not using, or outright blocking RSS functionality in wordpress. Some sites could be more open that way, but no. RSS feeds are closed/removed

    - some sites have "/blog" location, but the main domain is empty, or nearly empty, or returns 404. Can I trust such location?

    - when HTML meta data are not available. I like YouTube. It allows me to scrape metadata, but it protects video contents, and that is good

    - weird redirects. Domain does not have any contents. Does not describe what it is. It just have javascript redirects. From main domain to some weird locations within the domain

    - url shorteners, vanity links. You do not know where you will be transported. I understand they are counting sheep, but they sacrifice my security

    - google returning links with syntax "https://www.google.com/url", not directly. Youtube does the same with syntax "https://www.youtube.com/redirect". For me again this is vulnerability

    My ethic web scraper results are placed in: https://github.com/rumca-js/Internet-Places-Database.

  • Google just updated its algorithm. The Internet will never be the same
    1 project | news.ycombinator.com | 25 May 2024
    I have been collecting personal sites for some time. Format JSON files. Links have tag 'personal'.

    https://github.com/rumca-js/Internet-Places-Database

  • Microsoft Bing Search, Copilot face downtime
    2 projects | news.ycombinator.com | 23 May 2024
    That is exactly why I scrape internet. I maintain all found domains in github:

    https://github.com/rumca-js/Internet-Places-Database

    I wanted to have all links about amiga, or commodore, chiptune.

    It is not a search engine. For now, it is only data.

    Maybe this will help somebody, or somebody will be able to use this data better.

    I have a demo app running on rpi. It may be immediately broken if top many ppl accessed it.

    https://renegat0x0.ddns.net/apps/places

  • 38% of webpages that existed in 2013 are no longer accessible a decade later
    5 projects | news.ycombinator.com | 18 May 2024
    Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.

    There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.

    Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.

    Our society produces content, not quality, not products.

    SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.

    Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".

    Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.

    Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database

    I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).

    The second thing is I cannot find anything "amiga" related.

    This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.

    Right now I am using less and less google, because I use more and more my bookmark manager.

    https://github.com/rumca-js/Django-link-archive

    My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.

    In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.

  • A list of open source games
    5 projects | news.ycombinator.com | 13 May 2024
    Sorry for spamming, but I also create list.

    In my repo maintain domains in JSON format. Tags "video game" and "open source" also provide list of open source games.

    Other useful combination is tag "self-host", which provides self hostable programs, etc etc.

    Link:

    https://github.com/rumca-js/Internet-Places-Database

    Some games are tagged "video game port" because they are reimplementation of existing games rather than providing something new.

  • Google Search results polluted by buggy AI-written code frustrate coders
    1 project | news.ycombinator.com | 1 May 2024
    I started gathering domains to see for myself the state of the Internet

    https://github.com/rumca-js/Internet-Places-Database

    I have many observations.

    One is that I cannot see aby useful amiga links. I had to manually search them for some time. Some parts of the old internet exist, but are buried.

    Second is that spam sites are everywhere. Not only AI generator.

    Next is that personal sites exist, but they are often boring. Also 'CV sites' are a waste of time for me. I wonder how many of them are fake.

    Many sites have poorly set up HTML meta fields, title, description. How anybody is supposed to find them?

    I prefer going to passionate personal site about programming tips that reading content farms. It is difficult to find such sites.

  • Show HN: OpenOrb, a curated search engine for Atom and RSS feeds
    7 projects | news.ycombinator.com | 22 Apr 2024
    You can find many RSS feeds, links in my repository

    https://github.com/rumca-js/Internet-Places-Database/tree/ma...

    It contains also domain lists, that include tag indicating, if it is personal, or not.

  • We Need to Rewild the Internet
    2 projects | news.ycombinator.com | 16 Apr 2024
    I am running my personal web crawler since September of 2022. I gather internet domains and assign them meta information. There are various sources of my data. I assign "personal" tag to any personal website. I assign "self-host" tag to any self-host program I find.

    I have less than 30k of personal websites.

    Data are in the repository.

    https://github.com/rumca-js/Internet-Places-Database

    I still rely on google for many things, or kagi. It is interesting to me, what my crawler finds next. It is always a surprise to see new blog, or forgotten forum of sorts.

    This is how I discover real new content on the Internet. Certainly not by google which can find only BBC, or techcrunch.

  • The internet is slipping out of our reach
    1 project | news.ycombinator.com | 12 Mar 2024
    Google will not be interested in fixing search. It also may not be possibile because of ai spam. They would like to invest in deep mind/bard/gemini than to fix technology that will be obsolete in a few years.

    I have started scanning domains to see how many different places there are in the internet. Spoiler: Not many.

    We could try to create curated open databases for links, forums, places, and links, but in ai era it will always be a niche.

    Having said that I think that it is a good thing. If it is a niche it will not be spoiled by normal users expecting simple behavior, or corporations trying to control the output.

    Start your blog

    Start your curated lists of links.

    Control your data. Share your data.

    Link https://github.com/rumca-js/Internet-Places-Database

What are some alternatives?

When comparing hackernews-personal-blogs and Internet-Places-Database you can also consider the following projects:

Node RED - Low-code programming for event-driven applications

polychrome.nvim - A colorscheme creation micro-framework for Neovim

Miniflux - Minimalist and opinionated feed reader

webring - Make yourself a website

recess - A content aggregator for keeping up and interacting with siloed content.

RSS-Link-Database - Bookmarked archived links

notifeed - Watch RSS/Atom feeds and send push notifications/webhooks when new content is detected

webpub - Give me a website, I'll make you an epub.

clipzoomfx - Side-project for extracting highlights from (mostly sports) videos

catwiki_p3 - CatWiki (using Python 3)

sunburn.nvim - A Neovim colorscheme emphasizing readability above all else.

oatmeal - Terminal UI to chat with large language models (LLM) using different model backends, and integrations with your favourite editors!

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured