The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 News Open-Source Projects
-
newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
-
Stream-Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
news-please
news-please - an integrated web crawler and information extractor for news that just works
-
simorgh
The BBC's Open Source Web Application. Contributions welcome! Used on some of our biggest websites, e.g.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
GNews
A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a JSON response.
-
sdupdates
A mega collection of all resources and news related to Stable Diffusion. Focused around AUTOMATIC1111's webui (https://github.com/AUTOMATIC1111/stable-diffusion-webui)
-
Giveme5W1H
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
-
Readflow
readflow is a news-reading (or read-it-later) solution focused on versatility and simplicity.
-
research-threats
Collection of legal threats against good faith Security Researchers; vulnerability disclosure gone wrong. A continuation of work started by @attritionorg
-
hn_summary
Summarizes top stories from Hacker News using a large language model and post them to a Telegram channel.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
I have looked into a getstream.io integration, however it seems that the Ruby SDK is really treated as a second class citizen. There's bugs with the documented API (I'm having issues even creating users and querying users), the usage of the gem is low and there is an open issue since May that no one has even looked at, which doesn't give me hope for long term support.
Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
Project mention: Does Thai language uses question mark in the end of the sentence to denote an interrogative sentence? | /r/thai | 2023-05-25But on some articles here I can see question marks.
Project mention: Vocês já desenvolveram projetos pessoais grandes? Como isso te afetou? | /r/brdev | 2023-06-25
Nextcloud running the News app [1] on your own server, either an old laptop/desktop or a SBC like a Raspberry/Orange/Banana/${fruit} Pi. You'll get total control over whatever you do with the thing, as much 'free' cloud storage as you want and loads of other possible services. It runs fine on a Raspberry Pi 4 or one of the equivalent boards from other manufacturers.
Source: I've been running this before the Owncloud/Nextcloud split, it works as advertised.
[1] https://apps.nextcloud.com/apps/news
Project mention: Need some help with my personal project (interactive world map with real-time data) | /r/datascience | 2023-05-15The web crawling part wasn't much of an issue - I am using an existing API (https://pypi.org/project/gnews/) which does what I needed. The issue lies in, well, pretty much the rest of the task described above. I need to create an interactive world map with real-time data (news articles) - more specifically, maintaining the data server, figuring out the data mapping part, etc. Since I pretty much have no experience in this, I would like to ask you guys for some directions. What tool would I need to use and how would I store/load the data? Is it possible to do so without writing some Javascript code myself?
Great list, maybe double check with https://github.com/ligurio/awesome-openbsd
Project mention: Python script that opens my bookmarks and returns only links posted in the last 14 days | /r/learnpython | 2023-05-07Another option you could consider would be using a wrapper library around google news if you struggle with implementing the scarping logic yourself. The downside is that you'll still have to be careful so your IP doesn't get blocked. Make sure you limit the amount of requests per second/minute...
Project mention: I pwned half of America's fast food chains, simultaneously | news.ycombinator.com | 2024-01-09Everybody has that goal until they get a knock on their door at 6am: https://github.com/disclose/research-threats
Project mention: Generative AI Market Analysis: People Love to Cum | news.ycombinator.com | 2023-09-19interesting, GPT refuses to summarize this content: "I'm sorry, but I can't generate a summary for that content." per https://github.com/jiggy-ai/hn_summary & https://t.me/hn_summary
News related posts
- Ask HN: Comments requesting paywall bypass links
- Feathers Are One of Evolution's Cleverest Inventions
- What will humans do if technology solves everything?
- Building an AI Coach to Tame My Monkey Mind
- One Satellite Signal Rules Modern Life. What If Someone Knocks It Out?
- "Dune" and the Delicate Art of Making Fictional Languages
-
newscatcher VS python-client - a user suggested alternative
2 projects | 9 Feb 2024
-
A note from our sponsor - WorkOS
workos.com | 23 Apr 2024
Index
What are some of the best open-source News projects? This list will help you:
Project | Stars | |
---|---|---|
1 | newspaper | 13,703 |
2 | Stream-Framework | 4,718 |
3 | Refinery CMS | 3,890 |
4 | trafilatura | 2,740 |
5 | news-please | 1,925 |
6 | simorgh | 1,243 |
7 | pygooglenews | 1,230 |
8 | circumflex | 1,049 |
9 | burlesco | 879 |
10 | news | 797 |
11 | GNews | 532 |
12 | sdupdates | 512 |
13 | Giveme5W1H | 500 |
14 | awesome-openbsd | 420 |
15 | reldens | 412 |
16 | Readflow | 374 |
17 | marquee-scroller | 320 |
18 | GoogleNews | 306 |
19 | journalist | 286 |
20 | research-threats | 269 |
21 | hn_summary | 240 |
22 | allinfosecnews_sources | 225 |
23 | FLUTTER_NewsApp | 217 |
Sponsored