SaaSHub helps you find the best software and product alternatives Learn more →
Top 19 Python data-analytic Projects
-
Project mention: Show HN: Pathway – Build Mission Critical ETL and RAG in Python (NATO, F1 Used) | news.ycombinator.com | 2024-06-13
The main factor impacting the RAM requirement of the instance is the size of the data that you feed into it, especially if you need an in-memory index. (If you are curious about peak memory use etc., you can profile Pathway memory use in Grafana: https://github.com/pathwaycom/pathway/tree/main/examples/pro....)
One point to clarify is that "Pathway Community" is self-hosted, and the "8GB RAM - 4 cores" value is just a limit on the dimension of your own/cloud machine that the framework will effectively use. Currently, if you would like to get a "free" cloud machine to go with your project, we suggest going for "Pathway Scale" and reaching out through the #Developer Assist link - add a mention that you are interested in cloud credits. You can also go with 3rd party hosting providers like http://render.com/ who have a (somewhat modest) free tier for Docker instances, or reasonably priced ones like fly.io https://fly.io/docs/about/pricing/.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
-
-
bitcoin-etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
-
ethereum-etl-airflow
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
-
-
Project mention: Show HN: SQLFrame – I ran PySpark without Spark on a SQL database | news.ycombinator.com | 2024-05-20
This is cool and in my mind super useful for migrations.
It seems the main benefit of using something like that in daily life is that it's more convenient to generate complex SQL statements (like pivoting a table with a lot of columns).
However, I never really liked the PySpark dataframe api and looking at the code examples, SQL has the same visual complexity.
Snowflake has built something similar (just for Snowflake) SnowPark [1]. Here one promoted benefit was that you could also inject native Python function and "extend" the SQL dialect. However, I don't think it really took off.
[1] https://github.com/snowflakedb/snowpark-python
-
-
swiple
Swiple enables you to easily observe, understand, validate and improve the quality of your data
-
Project mention: Fast Python: High performance techniques for large datasets | news.ycombinator.com | 2024-07-24
-
opteryx
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
-
Just for some respite from the discussion of our soon-to-be AI overlords (LLMs), I'm one of the contributors to an open-source Python package, Xplainable (https://github.com/xplainable/xplainable). Xplainable is a novel (structured) machine learning algorithm that's inherently explainable, as opposed to being a post-hoc explainer (like SHAP or Lime).
-
dpq
dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy
Project mention: Show HN: Dpq – a small Python library to process data using LLMs | news.ycombinator.com | 2024-04-12 -
SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
-
-
-
-
Flight-Test-Data-Analytics-Module-01
Code to support Module 01 of the Daedalus Aerospace Flight Test Data Analytics course.
Python data-analytics discussion
Python data-analytics related posts
-
Fast Python: High performance techniques for large datasets
-
Explainable (Structured) Machine Learning Algorithm
-
Show HN: Build dashboards in Jupyter Notebook with numeric and chart boxes
-
Show HN: Build dashboard boxes with charts and numbers in Jupyter Notebook
-
Show HN: Build dashboards in Jupyter Notebook from bloxs
-
Bloxs: Display your data as cards in your Python notebook!
-
Show HN: Bloxs – display data as cards in your notebook
-
A note from our sponsor - SaaSHub
www.saashub.com | 4 Dec 2024
Index
What are some of the best open-source data-analytic projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pathway | 4,357 |
2 | datachain | 2,036 |
3 | diffgram | 1,851 |
4 | isp-data-pollution | 590 |
5 | bitcoin-etl | 410 |
6 | ethereum-etl-airflow | 408 |
7 | traffic | 374 |
8 | snowpark-python | 274 |
9 | bloxs | 217 |
10 | swiple | 80 |
11 | python-performance | 73 |
12 | opteryx | 86 |
13 | xplainable | 57 |
14 | dpq | 24 |
15 | SmartPipeline | 23 |
16 | GreyNSights | 22 |
17 | dictum | 21 |
18 | Webtap.ai | 12 |
19 | Flight-Test-Data-Analytics-Module-01 | 5 |