Python Science and Data analysis

Open-source Python projects categorized as Science and Data analysis

Top 23 Python Science and Data analysis Projects

Science and Data analysis
  1. Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: MLOps Lifecycle: Stages, Workflow, and Best Practices | dev.to | 2026-06-02

    Feature transformations should be deterministic: The same input should produce the same output when the same feature definition and configuration are applied. This is what allows training, backtesting, and live inference to remain aligned. Tools such as Pandas, Spark, or feature platforms such as Feast can be used to implement that logic.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. NumPy

    The fundamental package for scientific computing with Python.

    Project mention: 16 Python Libraries You Should Know | dev.to | 2026-05-21

    NumPy

  4. NetworkX

    Network Analysis in Python

    Project mention: The AI-Native GraphDB + GraphRAG + Graph Memory Landscape & Market Catalog | dev.to | 2025-10-26

    NetworkX - networkx.org/

  5. pygwalker

    PyGWalker: Turn your dataframe into an interactive UI for visual analysis

  6. SciPy

    SciPy library main repository

    Project mention: Uv is fantastic, but its package management UX is a mess | news.ycombinator.com | 2026-05-21

    Scipy maintainer here, the main issue with the wheels was the Fortran77 that was SciPy throwing wrenches into the mix. With C/C++ self compilation should be quite straightforward. We (all Scientific Python packages) really worked hard on that.

    From version 1.19 of SciPy there will be no need for fortran compilers (because we translated everything to C https://github.com/scipy/scipy/issues/18566) and then all becomes much easier in all platforms due to the large availability of C compilers in all platforms. Together with the Stable API developments in CPython the wheel clash issues "hopefully" will decrease gradually.

  7. SymPy

    A computer algebra system written in pure Python

    Project mention: Translating Cython to Mojo, a first attempt | news.ycombinator.com | 2025-10-06

    It looks like Narwhals; "Narwhals and scikit-Lego came together to achieve dataframe-agnosticism" https://news.ycombinator.com/item?id=40950813 :

    > Narwhals: https://narwhals-dev.github.io/narwhals/ :

    >> Extremely lightweight compatibility layer between [pandas, Polars, cuDF, Modin]

    > Lancedb/lance works with [Pandas, DuckDB, Polars, Pyarrow,]; https://github.com/lancedb/lance

    SymPy has Solvers for ODEs and PDEs and convex optimization. SymPy also has lambdify to compile from a relatively slow symbolic expression tree to faster 'vectorized' functions

    From https://news.ycombinator.com/item?id=40683777 re: warp :

    > sympy.utilities.lambdify.lambdify() https://github.com/sympy/sympy/blob/main/sympy/utilities/lam... :

    >>> """Convert a SymPy expression into a function that allows for fast numeric evaluation""" [with e.g. the CPython math module, mpmath, NumPy, SciPy, CuPy, JAX, TensorFlow, PyTorch (*), SymPy, numexpr, but not yet cmath]

  8. Dask

    Parallel computing with task scheduling

  9. statsmodels

    Statsmodels: statistical modeling and econometrics in Python

  10. Numba

    NumPy aware dynamic Python compiler using LLVM

    Project mention: Python JIT project was asked to pause development | news.ycombinator.com | 2026-06-06

    Also you can use projects like numba https://numba.pydata.org/

  11. PyMC

    Bayesian Modeling and Probabilistic Programming in Python

    Project mention: Hierarchical Bayesian Regression with PyMC: When Groups Share Strength | dev.to | 2026-04-26

    By the end of this post, you'll build a hierarchical Bayesian regression model in PyMC, compare it against pooled and unpooled alternatives, and see shrinkage in action on synthetic insurance data.

  12. orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Orange: No-code data mining, visualization and machine learning toolbox | news.ycombinator.com | 2025-10-22
  13. astropy

    Astronomy and astrophysics core library

  14. Biopython

    Official git repository for Biopython (originally converted from CVS)

  15. statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

  16. blaze

    NumPy and Pandas interface to Big Data

  17. fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

  18. Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  19. bcbio-nextgen

    Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

  20. NIPY

    Workflows and interfaces for neuroimaging packages

  21. Neupy

    NeuPy is a Tensorflow based python library for prototyping and building neural networks

  22. bccb

    Incubator for useful bioinformatics code, primarily in Python and R

  23. Bubbles

    [NOT MAINTAINED] Bubbles – Python ETL framework (by Stiivi)

  24. PyDy

    Multibody dynamics tool kit.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Science and Data analysis discussion

Log in or Post with

Python Science and Data analysis related posts

  • Python JIT project was asked to pause development

    2 projects | news.ycombinator.com | 6 Jun 2026
  • MLOps Lifecycle: Stages, Workflow, and Best Practices

    4 projects | dev.to | 2 Jun 2026
  • What Training Exists for Security Professionals Learning AI and Data Science?

    5 projects | dev.to | 23 May 2026
  • Uv is fantastic, but its package management UX is a mess

    2 projects | news.ycombinator.com | 21 May 2026
  • 16 Python Libraries You Should Know

    6 projects | dev.to | 21 May 2026
  • Best AI Cybersecurity Training for Security Teams: How to Pick

    5 projects | dev.to | 18 May 2026
  • Introduction to Python for Data Analysis: A Beginner’s Guide

    1 project | dev.to | 15 May 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 12 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Science and Data analysis projects in Python? This list will help you:

# Project Stars
1 Pandas 48,955
2 NumPy 32,155
3 NetworkX 16,984
4 pygwalker 15,833
5 SciPy 14,744
6 SymPy 14,665
7 Dask 13,848
8 statsmodels 11,455
9 Numba 11,042
10 PyMC 9,630
11 orange 5,629
12 astropy 5,178
13 Biopython 5,064
14 statsforecast 4,806
15 blaze 3,195
16 fugue 2,165
17 Cubes 1,480
18 bcbio-nextgen 1,030
19 NIPY 826
20 Neupy 736
21 bccb 644
22 Bubbles 460
23 PyDy 411

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?