Python Science and Data analysis

Open-source Python projects categorized as Science and Data analysis

Top 23 Python Science and Data analysis Projects

Science and Data analysis
  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: 7 Python Excel Libraries: In-Depth Review for Developers | dev.to | 2024-07-18

    Pandas is a powerful data manipulation and analysis library that provides easy-to-use data structures and data analysis tools. It includes the read_excel and to_excel functions to read from and write to Excel files. It leverages third-party libraries like OpenPyXL and xlrd to read from and write to Excel files.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • NumPy

    The fundamental package for scientific computing with Python.

    Project mention: Python: From Beginners to Pro in 30 Mins (Part 1) | dev.to | 2024-07-15

    PyCharm also integrates well with various Python frameworks and tools. It offers excellent support for web development frameworks like Django and Flask and scientific computing libraries like NumPy and Matplotlib.

  • NetworkX

    Network Analysis in Python

    Project mention: Routes to LANL from 186 sites on the Internet | news.ycombinator.com | 2024-03-04

    Built from this data... https://github.com/networkx/networkx/blob/main/examples/grap...

  • SciPy

    SciPy library main repository

    Project mention: What Is a Schur Decomposition? | news.ycombinator.com | 2024-03-04

    I guess it is a rite of passage to rewrite it. I'm doing it for SciPy too together with Propack in [1]. Somebody already mentioned your repo. Thank you for your efforts.

    [1]: https://github.com/scipy/scipy/issues/18566

  • SymPy

    A computer algebra system written in pure Python

    Project mention: Nvidia Warp: A Python framework for high performance GPU simulation and graphics | news.ycombinator.com | 2024-06-14

    From https://news.ycombinator.com/item?id=37686351 :

    >> sympy.utilities.lambdify.lambdify() https://github.com/sympy/sympy/blob/a76b02fcd3a8b7f79b3a88df... :

    >> """Convert a SymPy expression into a function that allows for fast numeric evaluation [e.g. the CPython math module, mpmath, NumPy, SciPy, CuPy, JAX, TensorFlow, SymPy, numexpr,]

  • Dask

    Parallel computing with task scheduling

    Project mention: Ask HN: What's the right tool for this job? | news.ycombinator.com | 2024-07-20

    From what I've seen, there are sort of two paths. I'll provide a well known example from each.

    1. lang specific distributed task library

    For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.

    * https://github.com/celery/celery

    Or lower level:

    * https://github.com/dask/dask

    2. DAG Workflow systems

    There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:

    * https://github.com/apache/airflow

  • pygwalker

    PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

    Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

  • Numba

    NumPy aware dynamic Python compiler using LLVM

    Project mention: Nvidia Warp: A Python framework for high performance GPU simulation and graphics | news.ycombinator.com | 2024-06-14
  • PyMC

    Bayesian Modeling and Probabilistic Programming in Python

  • BigDL

    Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

    Project mention: LLaMA Now Goes Faster on CPUs | news.ycombinator.com | 2024-03-31

    Any performance benchmark against intel's 'IPEX-LLM'[0] or others?

    [0] - https://github.com/intel-analytics/ipex-llm

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  • astropy

    Astronomy and astrophysics core library

    Project mention: Julia 1.10 Released | news.ycombinator.com | 2023-12-27

    Astropy [0] lives at the heart of most work. It has a Python interface, often backed by Fortran and C++ extension modules. If you use Astropy, you're indirectly using libraries like ERFA [6] and cfitsio [7] which are in C/Fortran.

    I personally end up doing a lot of work that uses the HEALPix sky tesselation, so I use healpy [2] as well.

    Openorb is perhaps a good example of a pure-Fortran package that I use quite. frequently for orbit propagation [3].

    In C, there's Rebound [4] (for N-body simulations) and ASSIST [5] (which extends Rebound to use JPL's pre-calculated positions of major perturbers, and expands the force model to account for general relativity).

    There are many more, these are just ones that come to mind from frequent usage in the last few months.

    [0] https://www.astropy.org/

  • Biopython

    Official git repository for Biopython (originally converted from CVS)

  • statsforecast

    Lightning ⚡️ fast forecasting with statistical and econometric models.

    Project mention: TimeGPT-1 | news.ycombinator.com | 2023-10-13

    I can't find the TimeGPT-1 model.

    LICENSE Apache-2

    https://github.com/Nixtla/statsforecast/blob/main/LICENSE

    Mentions ARIMA, ETS, CES, and Theta modeling

  • blaze

    NumPy and Pandas interface to Big Data

    Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19

    Unfortunate name overlap with an under-loved PyData project: https://blaze.pydata.org

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

    Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22
  • Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  • bcbio-nextgen

    Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

  • Neupy

    NeuPy is a Tensorflow based python library for prototyping and building neural networks

  • NIPY

    Workflows and interfaces for neuroimaging packages

  • bccb

    Incubator for useful bioinformatics code, primarily in Python and R

  • Bubbles

    [NOT MAINTAINED] Bubbles – Python ETL framework (by Stiivi)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Science and Data analysis discussion

Log in or Post with

Python Science and Data analysis related posts

  • 7 Python Excel Libraries: In-Depth Review for Developers

    3 projects | dev.to | 18 Jul 2024
  • Python: From Beginners to Pro in 30 Mins (Part 1)

    5 projects | dev.to | 15 Jul 2024
  • Pi calculation world record with over 202T digits

    3 projects | news.ycombinator.com | 15 Jul 2024
  • Beating NumPy matrix multiplication in 150 lines of C

    1 project | news.ycombinator.com | 3 Jul 2024
  • NumPy 2.0.0 release – first major release since 2006

    1 project | news.ycombinator.com | 17 Jun 2024
  • NumPy 2.0.0 Release Notes

    1 project | news.ycombinator.com | 16 Jun 2024
  • NumPy 2.0.0

    1 project | news.ycombinator.com | 16 Jun 2024
  • A note from our sponsor - Scout Monitoring
    www.scoutapm.com | 23 Jul 2024
    Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source Science and Data analysis projects in Python? This list will help you:

Project Stars
1 Pandas 42,768
2 NumPy 27,149
3 NetworkX 14,496
4 SciPy 12,748
5 SymPy 12,615
6 Dask 12,262
7 pygwalker 10,780
8 statsmodels 9,775
9 Numba 9,654
10 PyMC 8,500
11 BigDL 6,310
12 orange 4,715
13 astropy 4,311
14 Biopython 4,245
15 statsforecast 3,752
16 blaze 3,178
17 fugue 1,920
18 Cubes 1,490
19 bcbio-nextgen 981
20 Neupy 742
21 NIPY 741
22 bccb 603
23 Bubbles 451

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?