pygooglenews
citus
pygooglenews | citus | |
---|---|---|
8 | 61 | |
1,234 | 9,840 | |
- | 1.2% | |
0.0 | 9.4 | |
6 months ago | 9 days ago | |
Python | C | |
MIT License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pygooglenews
-
Tips for Making a Popular Open-Source Project in 2021 [Ultimate Guide]
I have ~4k start in 2 Python libraries. Both help fetch live news articles. Links below.
These were my first libraries.
I took the approach of promoting them as any other product. You have to "sell" your code. Even if it's 100% free.
In my opinion, the most important thing is DEMO. Just make a GIF where you showcase what your software does:
* 80% of engineers won't even bother to read the description
No one will spend their precious time trying to get through your code.
[0] https://github.com/kotartemiy/newscatcher Programmatically collect normalized news from (almost) any website.
[1] https://github.com/kotartemiy/pygooglenews If Google News had a Python library
-
NLP beginner dataset for text classification, sentiment analysis and/or NER
I wrote a pygooglenews package for news mining out of google news.
-
68k.news: A Netscape 1.1 makeover of Google News
I'm curious where the data get fetched from. The Author mentions that Mozilla Readability and SimplePie are used.
Readability to parse the content. SimplePie to fetch the data (I assume). Dat from RSS feeds?
In case you want to make something similar, I recently wrote a blog on where you could get news data for free [1]
(self-promo) I'd recommend to take a look at my Python package to mine news data from Google News [2]. Also, in 3 days we're releasing an absolutely free News API [3] that will support ~50-100k top stories per day.
[1] https://blog.newscatcherapi.com/an-ultimate-list-of-open-sou...
[2] https://github.com/kotartemiy/pygooglenews
[3] https://newscatcherapi.com/free-news-api
-
Available data set for news headlines or articles over 2019-2020?
I wrote a python package to scrape google news headlines at scale: https://github.com/kotartemiy/pygooglenews
- A Happy and lightweight Python Package that searches Google News and returns a usable JSON response.
-
Write libraries instead of services, where possible
Write libraries AND services, where it makes sense.
I wrote a Python library to scrape google news [0]
We also have it as a service [1]
Want to know why? Because devs who can't pay won't pay. Businesses who can pay will rather pay for a service (API in our case), and not care about maintaining it.
[0] https://github.com/kotartemiy/pygooglenews
[1] https://newscatcherapi.com/google-news-api
-
Financial news
You can use this Python project to scrape Google News (https://github.com/kotartemiy/pygooglenews) If you look at the code you can get a feel of how to call Google News directly.
-
Interview brownie
https://github.com/kotartemiy/pygooglenews https://pypi.org/project/google-play-scraper/
citus
- SPQR 1.3.0: a production-ready system for horizontal scaling of PostgreSQL
- Citus: PostgreSQL extension that transforms Postgres into a distributed database
-
Figma's Databases team lived to tell the scale
I see they don't mention Citus (https://github.com/citusdata/citus), which is already a fairly mature native Postgres extension. From the details given in the article, in sounds like they just reimplemented it.
I wonder if they were unaware of it or disregarded it for a reason —I currently am in a similar situation as the one described in the blog, trying to shard a massive Postgres DB.
-
PostgreSQL Is Enough
It is possible, if you pay for it. You can do Multi-AZ Clustered Instances in RDS, where you get the benefits of Multi-AZ failover with traffic sharing.
If you can run your own infra – at least on an EC2 level – you can do things like Citus [0] for Postgres, which is about as close to "just add database nodes" as you'll get.
[0]: https://www.citusdata.com/
-
Vitess 18
So while searching for something like this for postgres I came across citus. Any one know how that stacks up?
https://github.com/citusdata/citus
- In-Depth Guide: Citus Technical Readme
-
Revolutionizing Database Scaling with CitusDB
References: CitusDB
- Squeeze the hell out of the system you have
- Show HN: Hydra 1.0 – open-source column-oriented Postgres
- Schema-based sharding comes to PostgreSQL with Citus
What are some alternatives?
newscatcher - Programmatically collect normalized news from (almost) any website.
Greenplum - Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.
Phoenix - Peace of mind from prototype to production
yugabyte-db - YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
libgit2 - A cross-platform, linkable library implementation of Git that you can use in your application.
vitess - Vitess is a database clustering system for horizontal scaling of MySQL.
fastapi-azure-auth - Easy and secure implementation of Azure Entra ID (previously AD) for your FastAPI APIs 🔒 B2C, single- and multi-tenant support.
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
jOOQ - jOOQ is the best way to write SQL in Java
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
oldweb-today - Browse emulated browsers connected to old web sites in your browser!
stolon - PostgreSQL cloud native High Availability and more.