Python Data Analysis

Open-source Python projects categorized as Data Analysis

Top 23 Python Data Analysis Projects

  • scikit-learn

    scikit-learn: machine learning in Python

  • Project mention: How to Build a Logistic Regression Model: A Spam-filter Tutorial | dev.to | 2024-05-05

    Online Courses: Coursera: "Machine Learning" by Andrew Ng edX: "Introduction to Machine Learning" by MIT Tutorials: Scikit-learn documentation: https://scikit-learn.org/ Kaggle Learn: https://www.kaggle.com/learn Books: "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien Géron "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman By understanding the core concepts of logistic regression, its limitations, and exploring further resources, you'll be well-equipped to navigate the exciting world of machine learning!

  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • Project mention: The ultimate guide to creating a secure Python package | dev.to | 2024-05-08

    It's also possible for you to give a package an alias by using the as keyword. For instance, you could use the pandas package as pd like this:

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • streamlit

    Streamlit — A faster way to build and share data apps.

  • Project mention: A quick comparison: Streamlit, Dash, Reflex and Rio | dev.to | 2024-05-30
  • gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

  • Project mention: AI enthusiasm #9 - A multilingual chatbot📣🈸 | dev.to | 2024-05-01

    gradio is a package developed to ease the development of app interfaces in python and other languages (GitHub)

  • best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13

    AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  • Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pandas-ai

    Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

  • Project mention: PandasAI is great but is there a more general library? | news.ycombinator.com | 2023-08-23
  • pygwalker

    PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

  • Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15
  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

  • mlcourse.ai

    Open Machine Learning Course

  • Project mention: Open Machine Learning Course | news.ycombinator.com | 2023-10-22
  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • akshare

    AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库 (by akfamily)

  • pyod

    A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)

  • Project mention: A Comprehensive Guide for Building Rag-Based LLM Applications | news.ycombinator.com | 2023-09-13

    This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod

  • imbalanced-learn

    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

  • knowledge-repo

    A next-generation curated knowledge sharing platform for data scientists and other technical professions.

  • Resume-Matcher

    Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.

  • Project mention: Hacktoberfest 2023: The Complete Guide | dev.to | 2023-09-22

    GitHub: https://github.com/srbhr/Resume-Matcher Website: https://www.resumematcher.fyi/ Discord: Resume Matcher's Discord Tech Stack: Python, NextJS, FastAPI, TypeScript

  • plotnine

    A Grammar of Graphics for Python

  • Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  • AWS Data Wrangler

    pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

  • Project mention: Read files from s3 using Pandas/s3fs or AWS Data Wrangler? | /r/dataengineering | 2023-12-06

    I had no problem with awswrangler (https://github.com/aws/aws-sdk-pandas) and it supports reading and writing partitions which was really helpful and a few other optimizations that made it a great tool

  • missingno

    Missing data visualization module for Python.

  • running_page

    Make your own running home page

  • Project mention: Ask HN: Comment here about whatever you're passionate about at the moment | news.ycombinator.com | 2023-11-06

    A resource recently shared in HN for running tech lovers https://github.com/yihong0618/running_page

  • igel

    a delightful machine learning tool that allows you to train, test, and use models without writing code

  • sweetviz

    Visualize and compare datasets, target values and associations, with one line of code.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Analysis related posts

  • A quick comparison: Streamlit, Dash, Reflex and Rio

    4 projects | dev.to | 30 May 2024
  • The Birth of Parquet

    3 projects | news.ycombinator.com | 8 May 2024
  • The ultimate guide to creating a secure Python package

    4 projects | dev.to | 8 May 2024
  • How to Build a Logistic Regression Model: A Spam-filter Tutorial

    1 project | dev.to | 5 May 2024
  • PDEP-13: The Pandas Logical Type System

    1 project | news.ycombinator.com | 4 May 2024
  • Cold-(Brew) Outreach: Landing my first big client at a coffee shop

    1 project | news.ycombinator.com | 30 Apr 2024
  • Pandas reset_index(): How To Reset Indexes in Pandas

    1 project | dev.to | 27 Apr 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 1 Jun 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Data Analysis projects in Python? This list will help you:

Project Stars
1 scikit-learn 58,415
2 Pandas 42,217
3 streamlit 32,377
4 gradio 29,755
5 best-of-ml-python 15,869
6 airbyte 14,379
7 ydata-profiling 12,141
8 pandas-ai 11,268
9 pygwalker 10,362
10 statsmodels 9,621
11 mlcourse.ai 9,483
12 cleanlab 8,876
13 akshare 8,549
14 pyod 8,029
15 imbalanced-learn 6,725
16 knowledge-repo 5,446
17 Resume-Matcher 4,604
18 plotnine 3,861
19 AWS Data Wrangler 3,823
20 missingno 3,828
21 running_page 3,316
22 igel 3,080
23 sweetviz 2,845

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com