Internet-Places-Database
chatgpt-shell
Internet-Places-Database | chatgpt-shell | |
---|---|---|
11 | 25 | |
21 | 768 | |
- | - | |
9.3 | 9.1 | |
2 days ago | about 1 month ago | |
Emacs Lisp | ||
GNU General Public License v3.0 only | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Internet-Places-Database
-
Google Search results polluted by buggy AI-written code frustrate coders
I started gathering domains to see for myself the state of the Internet
https://github.com/rumca-js/Internet-Places-Database
I have many observations.
One is that I cannot see aby useful amiga links. I had to manually search them for some time. Some parts of the old internet exist, but are buried.
Second is that spam sites are everywhere. Not only AI generator.
Next is that personal sites exist, but they are often boring. Also 'CV sites' are a waste of time for me. I wonder how many of them are fake.
Many sites have poorly set up HTML meta fields, title, description. How anybody is supposed to find them?
I prefer going to passionate personal site about programming tips that reading content farms. It is difficult to find such sites.
-
Show HN: OpenOrb, a curated search engine for Atom and RSS feeds
You can find many RSS feeds, links in my repository
https://github.com/rumca-js/Internet-Places-Database/tree/ma...
It contains also domain lists, that include tag indicating, if it is personal, or not.
-
We Need to Rewild the Internet
I am running my personal web crawler since September of 2022. I gather internet domains and assign them meta information. There are various sources of my data. I assign "personal" tag to any personal website. I assign "self-host" tag to any self-host program I find.
I have less than 30k of personal websites.
Data are in the repository.
https://github.com/rumca-js/Internet-Places-Database
I still rely on google for many things, or kagi. It is interesting to me, what my crawler finds next. It is always a surprise to see new blog, or forgotten forum of sorts.
This is how I discover real new content on the Internet. Certainly not by google which can find only BBC, or techcrunch.
-
The internet is slipping out of our reach
Google will not be interested in fixing search. It also may not be possibile because of ai spam. They would like to invest in deep mind/bard/gemini than to fix technology that will be obsolete in a few years.
I have started scanning domains to see how many different places there are in the internet. Spoiler: Not many.
We could try to create curated open databases for links, forums, places, and links, but in ai era it will always be a niche.
Having said that I think that it is a good thing. If it is a niche it will not be spoiled by normal users expecting simple behavior, or corporations trying to control the output.
Start your blog
Start your curated lists of links.
Control your data. Share your data.
Link https://github.com/rumca-js/Internet-Places-Database
-
YaCy, a distributed Web Search Engine, based on a peer-to-peer network
There are already many project about search:
- https://www.marginalia.nu/
- https://searchmysite.net/
- https://lucene.apache.org/
- elastic search
- https://presearch.com/
- https://stract.com/
- https://wiby.me/
I think that all project are fun. I would like to see one succeeding at reaching mainstream level of attention.
I have also been gathering links meta data for some time. Maybe I will use them to feed any eventual self hosted search engine, or language model, if I decide to experiment with that.
- domains for seed https://github.com/rumca-js/Internet-Places-Database
- bookmarks seed https://github.com/rumca-js/RSS-Link-Database
- links for year https://github.com/rumca-js/RSS-Link-Database-2024
-
A search engine in 80 lines of Python
I have myself dabbled a little bit in that subject. Some of my notes:
- some RSS feeds are protected by cloudflare. It is true however that it is not necessary for majority of blogs. If you would like to do more then selenium would be a way to solve "cloudflare" protected links
- sometimes even selenium headless is not enough and full blown browser in selenium is necessary to fool it's protection
- sometimes even that is not enough
- then I started to wonder, why some RSS feeds are so well protected by cloudflare, but who am I to judge?
- sometimes it is beneficial to cover user agent. I feel bad for setting my user agent to chrome, but again, why RSS feeds are so well protected?
- you cannot parse, read entire Internet, therefore you always need to think about compromises. For example I have narrowed area of my searches in one of my projects to domains only. Now I can find most of the common domains, and I sort them by their "importance"
- RSS links do change. There need to be automated means to disable some feeds automatically to prevent checking inactive domains
- I do not see any configurable timeout for reading a page, but I am not familiar with aiohttp. Some pages might waste your time
- I hate that some RSS feeds are not configured properly. Some sites do not provide a valid meta "link" with "application/rss+xml". Some RSS feeds have naive titles like "Home", or no title at all. Such a waste of opportunity
My RSS feed parser, link archiver, web crawler: https://github.com/rumca-js/Django-link-archive. Especially interesting could be file rsshistory/webtools.py. It is not advanced programming craft, but it got the job done.
Additionally, in other project I have collected around 2378 of personal sites. I collect domains in https://github.com/rumca-js/Internet-Places-Database/tree/ma... . These files are JSONs. All personal sites have tag "personal".
Most of the things are collected from:
https://nownownow.com/
https://searchmysite.net/
I wanted also to process domains from https://downloads.marginalia.nu/, but haven't got time to read structure of the files
-
Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search [pdf]
On the other hand it is not 1995. Time has moved on. I wrote a Simple RSS feed, that also serves as search engine for bookmarks.
I am able to run it in attick on raspberry pi. We do not have to rely so heavily on google.
https://github.com/rumca-js/Django-link-archive
It is true that it does not serve me as google, or kagi replacement. It is a very nice addition though.
With a little bit off determination I do not have to be so dependent on google.
Here is also a dump of known domains. Some are personal.
https://github.com/rumca-js/Internet-Places-Database
...and my bookmarks
https://github.com/rumca-js/RSS-Link-Database
Some more years, and google can go to hell.
-
Ask HN: What apps have you created for your own use?
[4] https://github.com/rumca-js/Django-link-archive
These are exported then to github repositories:
[5] https://github.com/rumca-js/RSS-Link-Database - bookmarks
[6] https://github.com/rumca-js/RSS-Link-Database-2023 - 2023 year news headlines
[7] https://github.com/rumca-js/Internet-Places-Database - all known to me domains, and RSS feeds
-
The Small Website Discoverability Crisis
My own repositories:
- bookmarked entries https://github.com/rumca-js/RSS-Link-Database
- mostly domains https://github.com/rumca-js/Internet-Places-Database
- all 'news' from 2023 https://github.com/rumca-js/RSS-Link-Database-2023
I am using my own Django program to capture and manage links https://github.com/rumca-js/Django-link-archive.
- Show HN: List of Internet Domains
chatgpt-shell
-
Devin, the First AI Software Engineer
I think it is a tooling issue. It is in no way obvious how use LLM's effectively, especially for really good writing results. Tweaking and tinkering can be time consuming indeed, but i use lately the chatgpt-shell [1] and it lends well to an iterative approach. One needs to cycle through some styles first, and then decide how to most effectively prompt for better results.
[1]https://github.com/xenodium/chatgpt-shell/blob/bf2d12ed2ed60...
-
Ask HN: What apps have you created for your own use?
- https://xenodium.com/an-ios-journaling-app-powered-by-org-pl... - Lately, I'm having a go at building a privacy-focused plain-text-based iOS journaling app. I starte building it for someone important in my life but now using it myself.
- https://flathabits.com - After reading Atomic Habits, I wanted a habit tracker but most had more friction than I wanted, required accounts, had distractions, lock-in etc. so I built a privacy-focused app, with little friction and no-lockin (saves to plain text).
- https://plainorg.com - There are a gazillion markdown apps on the App Store, but hardly any supporting org markup, so I built one.
- https://xenodium.com/scratch-a-minimal-scratch-area - I wanted a surface where I could just dump text with as few taps as possible.
- https://github.com/xenodium/macosrec - I wanted to take either screenshots or videos of macOS apps from the command line, so I could integrate anywhere.
- https://github.com/xenodium/chatgpt-shell - I'm far down the Emacs rabbit hole, so I prefer Emacs-integrated tools. Built a ChatGPT Emacs shell to see what the hype was all about ;) tl;dr it really does help.
- https://github.com/xenodium/dwim-shell-command - A way to manage and easily apply the gazillion one-liners (and more complex scripts) I've come across. I got close to 100 utils check-in now https://github.com/xenodium/dwim-shell-command#my-toolbox
- https://github.com/xenodium/ob-swiftui - Play around with SwiftUI layouts from the comfort of my preferd editor.
- https://github.com/xenodium/company-org-block - Org block completion.
- https://xenodium.com - I tend to scratch own itches and post my solutions here.
-
More advanced emacs tutorials
Every so often I scratch an itch to improve my workflow and write it up https://xenodium.com.
-
What I Have Changed My Mind About in Software Development
With lsp, the gap between IDEs vs text editors is narrowing. While I still prefer Emacs, I’m pragmatic enough to jump on to whatever tool does a better job for a specific task. At times, that is Xcode.
Was also sceptical about ChatGPT and changed my mind like OP. I was less pragmatic on this one and brought ChatGPT over to Emacs https://github.com/xenodium/chatgpt-shell. Pretty happy with the result so far.
-
Edit-mode for point-by-point text proofreading, like EditGPT?
There are a handful of chatgpt Emacs packages. I happen to have authored chatgpt-shell. For making a synchronous request, can use chatgpt-shell-post-prompt. For async, use chatgpt-shell-send-to-buffer with a handler.
-
Ask HN: Could you show your personal blog here?
https://xenodium.com will hit 10 years in November. It started as a single org file for personal notes (programming, cooking, Emacs, bookmarks, iOS dev, travel). One day, I decided to export it to HTML and make it accessible to me from anywhere. Sorta just became both notes and blog over time…
While the tone of the posts may have evolved a bit, the blog still serves as personal notes/reference of sorts. The tech behind it hasn’t changed a whole lot. It remains a single org file (https://raw.githubusercontent.com/xenodium/xenodium.github.i...) with my own ugly elisp hacks, but hey does the job ;-)
-
Use emacs as a ChatGPT app
u/xenodium's chatgpt-shell deserves a mention. It uses an intuitive Comint-shell based interaction and includes support for executable code blocks (in the comint-shell) and for org-babel. It's very polished -- I believe it also includes support for saving and restoring sessions, which gptel is yet to add.
-
Do you also write small guides for yourself to remind you of your own emacs workflows?
Yep. Turn some of them into posts https://xenodium.com
-
Is orgmode really that much better than an equivalent workflow using vim + other tools?
For certain concepts that I don't understand fully, I'm using chatgpt-shell. It is beyond fantastic and almost impossible to describe in a single post. This is, for example, just one of my use cases: When I'm writing a comment or a message to my colleague (and of course, yes, I edit just about any text in Emacs), I can select a paragraph and ask chatgpt-shell to improve it. It does, but it also shows me the diff of the changes, that is how I set it up.
-
Twenty Years of Blogging
Mine (https://xenodium.com) will hit 10 years in November. It started as a single org file for personal notes. One day I decided to export it to HTML as my accesible notes from anywhere. Sorta just became both notes and blog over time… While the tone of the posts may have evolved over time, they still serve as a notes/reference of sorts. The tech behind it hasn’t changed a whole lot. It remains is a single org file (https://raw.githubusercontent.com/xenodium/xenodium.github.i...).
What are some alternatives?
polychrome.nvim - A colorscheme creation micro-framework for Neovim
E2B - Secure cloud runtime for AI apps & AI agents. Fully open-source.
webring - Make yourself a website
gptel - A simple LLM client for Emacs
RSS-Link-Database - Bookmarked archived links
emacs-chatgpt-jarvis - press F12 to record, use whisper to transcribe and chatgpt to answer
notifeed - Watch RSS/Atom feeds and send push notifications/webhooks when new content is detected
ideas - a hundred ideas for computing - a record of ideas - https://samsquire.github.io/ideas/
webpub - Give me a website, I'll make you an epub.
go-cleanarchitecture - An example Go application demonstrating The Clean Architecture.
clipzoomfx - Side-project for extracting highlights from (mostly sports) videos
splitter - React component for building split views like in VS Code