Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
You know you can shorten the commit hash?
For example https://github.com/torvalds/linux/blob/9e02977bfa/kernel/dma...
I don't find this ugly, it's even very human readable. There's all info in the url that you even can use in the future if GitHub ever goes away.
Unfortunately, most people just throw "eh, it's low-maintenance".
To bring down the point, here's the source code for that link shortener: https://github.com/technoweenie/guillotine. It was last updated in 2015. If you don't believe that this is the source code, this was directly linked to in its announcement: https://github.blog/2011-11-10-git-io-github-url-shortener/. And if the extreme lag (in an attempt to save the links) is indicative if its backbone, it's just running in a single server, which is very likely to be horribly outdated.
It's kind of tricky to do in general case, e.g. even hackernews is keeping meaningful semantic information in id= query parameter.
Because of that it ultimately needs to a site-specific database/algorithm, perhaps with a fallback to the default behaviour like simply cleaning up the most common garbage like (_encoding/usg/etc). I suspect it's possible to use some sort of machine learning to guess the meaningful parts of the URL path/query/fragments, but even for that we need some human curation for the training set. I wish we could collaborate on a shared database/library for that, have sketched some ideas/applications/prior art here: https://beepb00p.xyz/exobrain/projects/cannon.html
I started thinking about it since I have a similar problem in Promnesia (https://github.com/karlicoss/promnesia#readme), a knowledge management tool I'm working on. Ideally I want to normalise URLS, so they address the exact bit of information, and nothing more.
AdGuard's URL tracking filters (https://github.com/AdguardTeam/AdguardFilters/tree/master/Tr...) should cover most of URL cleaning. Even if there are websites it doesn't filter all parameters from, it should be good enough for the most part.
This one seems the worst: https://github.com/zws-im/zws (Shorten URLs with invisible spaces.)
I played around with it and there was some seriously sketchy interstitial stuff going on.