-
hacker-news-undocumented
Some of the hidden norms about Hacker News not otherwise covered in the Guidelines and the FAQ.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
On that note, here[1] is a resource with some more undocumented HN features.
[1] https://github.com/minimaxir/hacker-news-undocumented
If you care about the number of votes or comments an article gets you can get it out of Firebase
https://github.com/HackerNews/API
and it is a much more certain thing. I have a (i) a model that predicts "will this headline get more than 10 votes?" and (ii) one that predicts "if this headline gets more than 10 votes does it get a ratio of comments to votes greater than the median (roughly 0.5)?"
The best model I have for (i) is still a bag of words model that doesn't try to correct for time series variations, the AuC is atrocious, maybe around 65%, but I like the model because high-scoring headlines look like a parody of high-scoring headlines, I think "Richard Stallman has died" could be the best possible headline. (It's silly to thing you could get good performance at this because it can't see if the article has a flashy picture or other attractive attributes that would raise the vote rate.) I've made other models with fancier methods but none perform better nor are more entertaining.
As for (ii) the most commented articles tend to be clickbaity so it would be irresponsible to submit a feed of high scoring articles that isn't well curated. I am getting an AuC of around 72% which is what I got with my first recommender.
Hi, I am collecting links from various places. Even from Hacker news. I have links since start of the year [1]. Maybe someone will find them useful. You should look at files named like [2].
[1] https://github.com/rumca-js/RSS-Link-Database-2023
[2] https.hnrss.orgfrontpage_entries.json