Python Science and Data analysis

Open-source Python projects categorized as Science and Data analysis

Top 23 Python Science and Data analysis Projects

  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: Deploying a Serverless Dash App with AWS SAM and Lambda | dev.to | 2024-03-04

    Dash is a Python framework that enables you to build interactive frontend applications without writing a single line of Javascript. Internally and in projects we like to use it in order to build a quick proof of concept for data driven applications because of the nice integration with Plotly and pandas. For this post, I'm going to assume that you're already familiar with Dash and won't explain that part in detail. Instead, we'll focus on what's necessary to make it run serverless.

  • NumPy

    The fundamental package for scientific computing with Python.

    Project mention: NumPy 2.0.0 Beta1 | news.ycombinator.com | 2024-03-18
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • NetworkX

    Network Analysis in Python

    Project mention: Routes to LANL from 186 sites on the Internet | news.ycombinator.com | 2024-03-04

    Built from this data... https://github.com/networkx/networkx/blob/main/examples/grap...

  • SciPy

    SciPy library main repository

    Project mention: What Is a Schur Decomposition? | news.ycombinator.com | 2024-03-04

    I guess it is a rite of passage to rewrite it. I'm doing it for SciPy too together with Propack in [1]. Somebody already mentioned your repo. Thank you for your efforts.

    [1]: https://github.com/scipy/scipy/issues/18566

  • SymPy

    A computer algebra system written in pure Python

    Project mention: SymPy: Symbolic Mathematics in Python | news.ycombinator.com | 2024-02-28

    That's interesting. You should consider yourself lucky to have met Wolfram employees, as they are obviously vastly outnumbered by users of Mathematica.

    I have not met any developers for either of these products but I know that SymPy has a huge list of contributors for a project of its size. See: https://github.com/sympy/sympy/blob/master/AUTHORS

    You may not be hearing about SymPy users because SymPy is not a monolithic product. It is a library. If you know mathematicians big into using Python, they are probably aware of SymPy as it is the main attraction when it comes to symbolic computation in Python.

  • Dask

    Parallel computing with task scheduling

    Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

    Project mention: statsmodels Release Candidate 0.14.0rc0 tagged | /r/Python | 2023-04-26
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • Numba

    NumPy aware dynamic Python compiler using LLVM

    Project mention: Mojo🔥: Head -to-Head with Python and Numba | dev.to | 2023-09-27

    Around the same time, I discovered Numba and was fascinated by how easily it could bring huge performance improvements to Python code.

  • pygwalker

    PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

    Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15
  • PyMC

    Bayesian Modeling and Probabilistic Programming in Python

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Ask HN: What Underrated Open Source Project Deserves More Recognition? | news.ycombinator.com | 2024-03-07
  • astropy

    Astronomy and astrophysics core library

    Project mention: Julia 1.10 Released | news.ycombinator.com | 2023-12-27

    Astropy [0] lives at the heart of most work. It has a Python interface, often backed by Fortran and C++ extension modules. If you use Astropy, you're indirectly using libraries like ERFA [6] and cfitsio [7] which are in C/Fortran.

    I personally end up doing a lot of work that uses the HEALPix sky tesselation, so I use healpy [2] as well.

    Openorb is perhaps a good example of a pure-Fortran package that I use quite. frequently for orbit propagation [3].

    In C, there's Rebound [4] (for N-body simulations) and ASSIST [5] (which extends Rebound to use JPL's pre-calculated positions of major perturbers, and expands the force model to account for general relativity).

    There are many more, these are just ones that come to mind from frequent usage in the last few months.

    [0] https://www.astropy.org/

  • Biopython

    Official git repository for Biopython (originally converted from CVS)

    Project mention: Invitación a proyecto - Biopython en Español | /r/devsarg | 2023-07-23
  • blaze

    NumPy and Pandas interface to Big Data

    Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19

    Unfortunate name overlap with an under-loved PyData project: https://blaze.pydata.org

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

    Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22
  • Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  • bcbio-nextgen

    Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

    Project mention: Deep Sleep May Be the Best Defense Against Alzheimer’s | news.ycombinator.com | 2023-05-22

    Re WGS there are a lot of well established tool chains that are FLOSS (eg https://github.com/bcbio/bcbio-nextgen). You could run alignment and variant calling on a beefy workstation. A laptop would potentially work. Easy to test this with publicly available raw data. Another option: The sequencing provider often will run alignment and some default variant calling for you. Annotating and analysing these variants can be done on pretty much any computer, all with open source software. A SNP chip is even easier to deal with as the computational requirements are less.

    Interpreting the results is a more manual process. Really depends on what you are interested in.

  • Neupy

    NeuPy is a Tensorflow based python library for prototyping and building neural networks

  • NIPY

    Workflows and interfaces for neuroimaging packages

  • bccb

    Incubator for useful bioinformatics code, primarily in Python and R

  • Bubbles

    [NOT MAINTAINED] Bubbles – Python ETL framework (by Stiivi)

  • PyDy

    Multibody dynamics tool kit.

  • harold

    An open-source systems and controls toolbox for Python3

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-18.

Python Science and Data analysis related posts

Index

What are some of the best open-source Science and Data analysis projects in Python? This list will help you:

Project Stars
1 Pandas 41,573
2 NumPy 26,009
3 NetworkX 14,004
4 SciPy 12,300
5 SymPy 11,940
6 Dask 11,885
7 statsmodels 9,412
8 Numba 9,309
9 pygwalker 9,270
10 PyMC 8,072
11 orange 4,551
12 astropy 4,148
13 Biopython 4,098
14 blaze 3,181
15 fugue 1,844
16 Cubes 1,490
17 bcbio-nextgen 968
18 Neupy 738
19 NIPY 726
20 bccb 590
21 Bubbles 448
22 PyDy 346
23 harold 170
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com