The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Python Data Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
-
knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Mimesis
Mimesis is a powerful Python library that empowers developers to generate massive amounts of synthetic data efficiently.
-
CKAN
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
-
PyPika
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
glom
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
-
diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
-
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: LlamaIndex: A data framework for your LLM applications | news.ycombinator.com | 2024-04-07
Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.
It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.
CKAN The Open Source Data Portal Software
I've looked at https://github.com/pydata/pandas-datareader and it looks good, does anyone have experience?
Project mention: any recommendations for a good query builder library with good support? | /r/learnpython | 2023-07-11I recently started using drizzle orm and I am now looking for something similar in python, my goal is to be as close to sql syntax as possible without just passing dml commands as strings, type safety would be cool as well, I saw this one pypika but it ha a lot of open issues and no commits for a year, is there anything similar but more stable?
Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.
Project mention: The Design Philosophy of Great Tables (Software Package) | news.ycombinator.com | 2024-04-042. The report you're sending out for display is _expected_ in an Excel format. The two main reasons for this are just organizational momentum, or that you want to let the receiver conduct additional ad-hoc analysis (Excel is best for this in almost every org).
The way we've sliced this problem space is by improving the interfaces that users can use to export formatting to Excel. You can see some of our (open-core) code here [2]. TL;DR: Mito gives you an interface in Jupyter that looks like a spreadsheet, where you can apply formatting like Excel (number formatting, conditional formatting, color formatting) - and then Mito automatically generates code that exports this formatting to an Excel. This is one of our more compelling enterprise features, for decision makers that work with non-expert Python programmers - getting formatting into Excel is a big hassle.
[1] https://trymito.io
[2] https://github.com/mito-ds/mito/blob/dev/mitosheet/mitosheet...
We've made a lot of data tooling things based on LLMs, and are in the process of rebranding and launching our main product.
1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch
2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)
3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)
For your next question, about the stack and deploy:
Colour Science is one of the more serious projects I know of, and more or less lets you get as advanced as you want. Used by film professionals among others. https://www.colour-science.org/
How would you define what the perfect color tool is? I would guess like most tools that it depends entirely on the job at hand, and that maybe no one perfect tool can exist. Colour Science might be great at serious color management and perceptual measurements and conversions between standardized color spaces, but not the right tool for a web developer looking for quick & easy way to make an HSV palette generation widget (and not because Colour Science is Python, but because it’s too big and heavy of a hammer).
Project mention: Ask HN: How can I get better at writing production-level Python? | news.ycombinator.com | 2023-07-18
Project mention: Ask HN: Freelancer? Seeking freelancer? (December 2023) | news.ycombinator.com | 2023-12-03SEEKING FREELANCER | REMOTE | GERMANY
dltHub is looking for a freelance help in the following repos:
- https://github.com/dlt-hub/dlt
Project mention: meltano VS cloudquery - a user suggested alternative | libhunt.com/r/meltano | 2023-06-02
Python Data related posts
- LlamaIndex: A data framework for your LLM applications
- LlamaIndex is a data framework for your LLM applications
- Ask HN: What have you built with LLMs?
- GitHub Innovation Graph
- Show HN: Finagg – free and nearly unlimited financial data
- Show HN: Data monitoring and profiling with 1 function call
- Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres
-
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024
Index
What are some of the best open-source Data projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | llama_index | 30,910 |
2 | Prefect | 14,586 |
3 | airbyte | 13,923 |
4 | chinese-xinhua | 10,641 |
5 | akshare | 8,364 |
6 | Mage | 7,001 |
7 | knowledge-repo | 5,432 |
8 | Mimesis | 4,304 |
9 | CKAN | 4,253 |
10 | datasets | 4,175 |
11 | TextRecognitionDataGenerator | 3,038 |
12 | pandas-datareader | 2,819 |
13 | PyPika | 2,371 |
14 | PyFunctional | 2,332 |
15 | mito | 2,215 |
16 | sketch | 2,194 |
17 | mara-pipelines | 2,054 |
18 | Colour | 1,974 |
19 | glom | 1,825 |
20 | diffgram | 1,796 |
21 | dlt | 1,722 |
22 | meltano | 1,587 |
23 | Cubes | 1,490 |
Sponsored