Python data-analytics

Open-source Python projects categorized as data-analytics

Top 19 Python data-analytic Projects

data-analytics
  • pathway

    Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

    Project mention: Show HN: Pathway – Build Mission Critical ETL and RAG in Python (NATO, F1 Used) | news.ycombinator.com | 2024-06-13

    The main factor impacting the RAM requirement of the instance is the size of the data that you feed into it, especially if you need an in-memory index. (If you are curious about peak memory use etc., you can profile Pathway memory use in Grafana: https://github.com/pathwaycom/pathway/tree/main/examples/pro....)

    One point to clarify is that "Pathway Community" is self-hosted, and the "8GB RAM - 4 cores" value is just a limit on the dimension of your own/cloud machine that the framework will effectively use. Currently, if you would like to get a "free" cloud machine to go with your project, we suggest going for "Pathway Scale" and reaching out through the #Developer Assist link - add a mention that you are interested in cloud credits. You can also go with 3rd party hosting providers like http://render.com/ who have a (somewhat modest) free tier for Docker instances, or reasonably priced ones like fly.io https://fly.io/docs/about/pricing/.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • datachain

    ETL and Analytics for Multimodal AI Data

    Project mention: DBT for Unstructured Data – DataChain | news.ycombinator.com | 2024-11-04
  • diffgram

    The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

  • isp-data-pollution

    ISP Data Pollution to Protect Private Browsing History with Obfuscation

  • bitcoin-etl

    ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

  • ethereum-etl-airflow

    Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

  • traffic

    A toolbox for processing and analysing air traffic data (by xoolive)

    Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • snowpark-python

    Snowflake Snowpark Python API

    Project mention: Show HN: SQLFrame – I ran PySpark without Spark on a SQL database | news.ycombinator.com | 2024-05-20

    This is cool and in my mind super useful for migrations.

    It seems the main benefit of using something like that in daily life is that it's more convenient to generate complex SQL statements (like pivoting a table with a lot of columns).

    However, I never really liked the PySpark dataframe api and looking at the code examples, SQL has the same visual complexity.

    Snowflake has built something similar (just for Snowflake) SnowPark [1]. Here one promoted benefit was that you could also inject native Python function and "extend" the SQL dialect. However, I don't think it really took off.

    [1] https://github.com/snowflakedb/snowpark-python

  • bloxs

    Build dashboards in Jupyter Notebook with numeric and chart boxes

  • swiple

    Swiple enables you to easily observe, understand, validate and improve the quality of your data

  • python-performance

    Repository for the book Fast Python - published by Manning

    Project mention: Fast Python: High performance techniques for large datasets | news.ycombinator.com | 2024-07-24
  • opteryx

    🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

  • xplainable

    Real-time explainable machine learning for business optimisation

    Project mention: Explainable (Structured) Machine Learning Algorithm | /r/Python | 2023-12-05

    Just for some respite from the discussion of our soon-to-be AI overlords (LLMs), I'm one of the contributors to an open-source Python package, Xplainable (https://github.com/xplainable/xplainable). Xplainable is a novel (structured) machine learning algorithm that's inherently explainable, as opposed to being a post-hoc explainer (like SHAP or Lime).

  • dpq

    dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy

    Project mention: Show HN: Dpq – a small Python library to process data using LLMs | news.ycombinator.com | 2024-04-12
  • SmartPipeline

    A framework for rapid development of robust data pipelines following a simple design pattern

  • GreyNSights

    Privacy-Preserving Data Analysis using Pandas

  • dictum

    Describe business metrics with YAML, query and visualize in Jupyter with zero SQL

  • Webtap.ai

    AI web scraping python library for efficient and reliable web scraping.

  • Flight-Test-Data-Analytics-Module-01

    Code to support Module 01 of the Daedalus Aerospace Flight Test Data Analytics course.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-analytics discussion

Log in or Post with

Python data-analytics related posts

  • Fast Python: High performance techniques for large datasets

    1 project | news.ycombinator.com | 24 Jul 2024
  • Explainable (Structured) Machine Learning Algorithm

    1 project | /r/Python | 5 Dec 2023
  • Show HN: Build dashboards in Jupyter Notebook with numeric and chart boxes

    1 project | news.ycombinator.com | 12 Sep 2022
  • Show HN: Build dashboard boxes with charts and numbers in Jupyter Notebook

    1 project | news.ycombinator.com | 26 Aug 2022
  • Show HN: Build dashboards in Jupyter Notebook from bloxs

    1 project | news.ycombinator.com | 28 Mar 2022
  • Bloxs: Display your data as cards in your Python notebook!

    1 project | /r/coolgithubprojects | 23 Mar 2022
  • Show HN: Bloxs – display data as cards in your notebook

    1 project | news.ycombinator.com | 23 Mar 2022
  • A note from our sponsor - SaaSHub
    www.saashub.com | 4 Dec 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source data-analytic projects in Python? This list will help you:

Project Stars
1 pathway 4,357
2 datachain 2,036
3 diffgram 1,851
4 isp-data-pollution 590
5 bitcoin-etl 410
6 ethereum-etl-airflow 408
7 traffic 374
8 snowpark-python 274
9 bloxs 217
10 swiple 80
11 python-performance 73
12 opteryx 86
13 xplainable 57
14 dpq 24
15 SmartPipeline 23
16 GreyNSights 22
17 dictum 21
18 Webtap.ai 12
19 Flight-Test-Data-Analytics-Module-01 5

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?