Python Science and Data analysis

Open-source Python projects categorized as Science and Data analysis

Top 23 Python Science and Data analysis Projects

Science and Data analysis
  1. Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: Top Programming Languages for AI Development in 2025 | dev.to | 2025-04-29

    Libraries for data science and deep learning that are always changing

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. NumPy

    The fundamental package for scientific computing with Python.

    Project mention: How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python | dev.to | 2025-04-24

    As is the case with most Python libraries, it is open-source and free-to-use, making it easily accessible by anyone willing to learn machine learning, and it is built upon other open-source libraries within Python, like SciPy for advanced scientific operations, NumPy for efficient numerical computations, Matplotlib for data visualization, and Cython for increased efficiency and speed, similar to that of C/C++.

  4. NetworkX

    Network Analysis in Python

    Project mention: Representing Graphs in PostgreSQL | news.ycombinator.com | 2025-02-17

    If you are interested in the subject, also take a look at NetworkDisk[1] which enable users of NetworkX[2] which maps graphs to databases.

    [1] https://networkdisk.inria.fr/

    [2] https://networkx.org/

  5. pygwalker

    PyGWalker: Turn your dataframe into an interactive UI for visual analysis

    Project mention: The DuckDB Local UI | news.ycombinator.com | 2025-03-12
  6. SciPy

    SciPy library main repository

    Project mention: Why Momentum Works (2017) | news.ycombinator.com | 2025-04-28

    [2] https://github.com/scipy/scipy/blob/main/scipy/optimize/_dcs...

  7. SymPy

    A computer algebra system written in pure Python

    Project mention: Mathics 7.0 – Open-source alternative to Mathematica | news.ycombinator.com | 2024-12-07

    It's an interesting exercise to think about why the performance of Sum[i, {i, 1, 100000}] differs between Mathics and MMA: Mathics just calls down to sympy, which I think just does the sum in Python [1]; Mathematica (likely) pattern-matches and computes the 100000th triangular number directly, since I know Mathematica relies heavily on standard tables of summations/integrals/etc.

    [1] https://github.com/sympy/sympy/blob/master/sympy/concrete/su....

  8. Dask

    Parallel computing with task scheduling

    Project mention: Ask HN: What's the right tool for this job? | news.ycombinator.com | 2024-07-20

    From what I've seen, there are sort of two paths. I'll provide a well known example from each.

    1. lang specific distributed task library

    For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.

    * https://github.com/celery/celery

    Or lower level:

    * https://github.com/dask/dask

    2. DAG Workflow systems

    There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:

    * https://github.com/apache/airflow

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. statsmodels

    Statsmodels: statistical modeling and econometrics in Python

    Project mention: The Truth About Linear Regression | news.ycombinator.com | 2024-07-30

    statsmodels is the closest thing in python to R. statsmodels has mixed model support, but mgcv apparently requires more. It is well above my paygrade, but this seems relevant: https://github.com/statsmodels/statsmodels/issues/8029 (i.e. no out of the box support, you might be able to build an approximation on your own).

  11. Numba

    NumPy aware dynamic Python compiler using LLVM

    Project mention: I Don't Like NumPy | news.ycombinator.com | 2025-05-15

    Have you heard of JIT libraries like numba (https://github.com/numba/numba)? It doesn't work for all python code, but can be helpful for the type of function you gave as an example. There's no need to rewrite anything, just add a decorator to the function. I don't really know how performance compares to C, for example.

  12. PyMC

    Bayesian Modeling and Probabilistic Programming in Python

  13. BigDL

    Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

    Project mention: FlashMoE: DeepSeek-R1 671B and Qwen3MoE 235B with 1~2 Intel B580 GPU in IPEX-LLM | news.ycombinator.com | 2025-05-12
  14. orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

  15. astropy

    Astronomy and astrophysics core library

    Project mention: Vision for Astronoby - Call for contributors and maintainers | dev.to | 2024-12-18

    One could be a project for accuracy. By integrating physical models and with the inspiration of existing important projects like Skyfield or Astropy, this project could focus on providing the most accurate and performant results possible in Ruby. Contributors could help optimise the code, running benchmarks, and covering as many use cases as possible.

  16. Biopython

    Official git repository for Biopython (originally converted from CVS)

    Project mention: How to Start Contributing to Open Source Software | dev.to | 2024-10-17

    I also like contributing specifically to my field. As a PhD student and possibly future scientist, I have a vested interest in the quality of the software in my field–specifically, structural bioinformatics. I use several tools in this field and often find areas that can be improved, both for myself and others. As an example, consider this minor documentation change I added to the Biopython documentation.

  17. statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

    Project mention: This Week In Python | dev.to | 2025-03-21

    statsforecast – Forecasting with statistical and econometric models

  18. blaze

    NumPy and Pandas interface to Big Data

  19. fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

  20. Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  21. bcbio-nextgen

    Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

  22. NIPY

    Workflows and interfaces for neuroimaging packages

  23. Neupy

    NeuPy is a Tensorflow based python library for prototyping and building neural networks

  24. bccb

    Incubator for useful bioinformatics code, primarily in Python and R

  25. Bubbles

    [NOT MAINTAINED] Bubbles – Python ETL framework (by Stiivi)

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Science and Data analysis discussion

Log in or Post with

Python Science and Data analysis related posts

  • FlashMoE: DeepSeek-R1 671B and Qwen3MoE 235B with 1~2 Intel B580 GPU in IPEX-LLM

    1 project | news.ycombinator.com | 12 May 2025
  • Why Momentum Works (2017)

    2 projects | news.ycombinator.com | 28 Apr 2025
  • How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python

    7 projects | dev.to | 24 Apr 2025
  • How to import sample data into a Python notebook on watsonx.ai and other questions…

    1 project | dev.to | 13 Apr 2025
  • Statsforecast: Fast Python forecasting with statistical and econometric models

    1 project | news.ycombinator.com | 20 Mar 2025
  • DeepSeek R1 671B Q4_K_M with 1~2 Arc A770 on Xeon

    3 projects | news.ycombinator.com | 5 Mar 2025
  • MacBook Air M4

    3 projects | news.ycombinator.com | 5 Mar 2025
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 19 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Science and Data analysis projects in Python? This list will help you:

# Project Stars
1 Pandas 45,442
2 NumPy 29,483
3 NetworkX 15,736
4 pygwalker 14,796
5 SciPy 13,679
6 SymPy 13,627
7 Dask 13,203
8 statsmodels 10,663
9 Numba 10,421
10 PyMC 9,014
11 BigDL 7,877
12 orange 5,185
13 astropy 4,686
14 Biopython 4,588
15 statsforecast 4,356
16 blaze 3,197
17 fugue 2,081
18 Cubes 1,484
19 bcbio-nextgen 1,005
20 NIPY 781
21 Neupy 737
22 bccb 622
23 Bubbles 452

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?