InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python open-data Projects
-
CKAN
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Project mention: Hugging Face is looking for reasoning datasets beyond math, science and coding | dev.to | 2025-04-16
OpenThoughts-114k generation code
-
Project mention: New Proofs Probe the Limits of Mathematical Truth | news.ycombinator.com | 2025-02-04
https://opendata.cern.ch/
I've gotten into 3D printing, and load and temperature data of different filaments is always appreciated.
Mixing materials together, microscopic images, etc...
I get a lot of value from YouTubers who simple follow a consistent methodology of endurance or break testing products or materials. Tear downs and documentation of internals, performance statistics, etc...
Channels like CNCKitchen or ProjectFarm are excellent citizen scientists for example.
-
Herbie
Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the University of Utah Pando Archive System. (by blaylockbk)
Project mention: Show HN: Gribstream.com – Historical Weather Forecast API | news.ycombinator.com | 2024-12-20"GenCast predicts weather and the risks of extreme conditions with state-of-the-art accuracy" (2024) https://deepmind.google/discover/blog/gencast-predicts-weath...
"Probabilistic weather forecasting with machine learning" (2024) ; GenCast paper https://www.nature.com/articles/s41586-024-08252-9
blaylockbk/Herbie: https://github.com/blaylockbk/Herbie :
> Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the Pando Archive System
The Herbie docs mention GFS GraphCast but not yet GenCast? https://herbie.readthedocs.io/en/stable/gallery/noaa_models/...
-
pudl
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
-
-
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
-
-
-
upgini
Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs
-
-
-
-
PatZilla
PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
-
open-australian-legal-corpus-creator
The code used to create and update the Open Australian Legal Corpus, the first and only multijurisdictional open corpus of Australian legislative and judicial documents.
-
open-grid-emissions
Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
-
-
-
Bus-Departure-Board
A selection of Python programs which will retrieve live bus and rail UK open data and output it to a ER-OLEDM032 (256X64) display screen.
-
wikdict-gen
Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project
-
-
-
datapusher-plus
Push data into the CKAN Datastore fast & reliably while inferring, calculating & suggesting metadata using Jinja2 Formulas defined in your scheming metadata schema. It pushes real good!
Project mention: Developing a CKAN Handler for MindsDB: Bridging Open Data and Machine Learning | dev.to | 2024-10-16CKAN serves as a data catalog, organizing metadata and actual data in its databases incorporating Datapusher Plus, powered by the lightning-fast QSV library.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python open-data discussion
Python open-data related posts
-
Hugging Face is looking for reasoning datasets beyond math, science and coding
-
Open Thoughts: open data curation for reasoning models
-
New Proofs Probe the Limits of Mathematical Truth
-
CKAN – The open source data management system
-
LHC experiments at CERN observe quantum entanglement at the highest energy yet
-
Latest JavaScript News, Updates, and Tutorials
-
Open Source takes center stage at United Nations
-
A note from our sponsor - InfluxDB
www.influxdata.com | 1 Sep 2025
Index
What are some of the best open-source open-data projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | CKAN | 4,805 |
2 | open-thoughts | 2,053 |
3 | opendata.cern.ch | 715 |
4 | Herbie | 634 |
5 | pudl | 553 |
6 | meteostat-python | 523 |
7 | innovationgraph | 492 |
8 | wetterdienst | 396 |
9 | UCF-SST-CitySim1-Dataset | 388 |
10 | upgini | 337 |
11 | nycdb | 230 |
12 | images | 185 |
13 | Kotori | 117 |
14 | PatZilla | 108 |
15 | open-australian-legal-corpus-creator | 94 |
16 | open-grid-emissions | 85 |
17 | osmand_map_creation | 82 |
18 | dribdat | 75 |
19 | Bus-Departure-Board | 54 |
20 | wikdict-gen | 51 |
21 | at-python | 46 |
22 | dashmap.io | 45 |
23 | datapusher-plus | 39 |