Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge. Learn more →
Top 23 Python Science and Data analysis Projects
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Project mention: Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide | dev.to | 2023-08-20AWS Data Wrangler is a Python library that simplifies the process of interacting with various AWS services, built on top of some useful data tools and open-source projects such as Pandas, Apache Arrow and Boto3. It offers streamlined functions to connect to, retrieve, transform, and load data from AWS services, with a strong focus on Amazon S3.
-
numpy
-
Mergify
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
-
Project mention: org-roam-pygraph: Build a graph of your org-roam collection for use in Python | /r/orgmode | 2023-05-07
org-roam-ui is a great interactive visualization tool, but its main use is visualization. The hope of this library is that it could be part of a larger graph analysis pipeline. The demo provides an example graph visualization, but what you choose to do with the resulting graph certainly isn't limited to that. See for example networkx.
-
Fortran codes have caused many problems for the Python package Scipy, and some of them are now being rewritten in C: e.g., https://github.com/scipy/scipy/pull/19121. Not only does R have many Fortran codes, there are also many R packages using Fortran codes: https://github.com/r-devel/r-svn, https://github.com/cran?q=&type=&language=fortran&sort=. Modern Fortran is a fine language but most legacy Fortran codes use the F77 style. When I update the R package quantreg, which uses many Fortran codes, I get a lot of warning messages. Not sure how the Fortran codes in the R ecosystem will be dealt with in the future, but they recently caused an issue in R due to the lack of compiler support for Fortran: https://blog.r-project.org/2023/08/23/will-r-work-on-64-bit-arm-windows/index.html. Some renowned packages like glmnet already have their Fortran codes rewritten in C/C++: https://cran.r-project.org/web/packages/glmnet/news/news.html
-
-
bug report opened https://github.com/sympy/sympy/issues/25507
-
Simulations are, at least in my experience, numba’s [0] wheelhouse.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
-
-
Both tools are ripoffs of a data mining framework named Orange 3
-
Project mention: [R] Astronomia ex machina: a history, primer and outlook on neural networks in astronomy | /r/MachineLearning | 2023-05-31
-
-
-
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Project mention: Daft: A High-Performance Distributed Dataframe Library for Multimodal Data | news.ycombinator.com | 2023-06-07Please integrate it with Fugue.
-
-
bcbio-nextgen
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
Project mention: Deep Sleep May Be the Best Defense Against Alzheimer’s | news.ycombinator.com | 2023-05-22Re WGS there are a lot of well established tool chains that are FLOSS (eg https://github.com/bcbio/bcbio-nextgen). You could run alignment and variant calling on a beefy workstation. A laptop would potentially work. Easy to test this with publicly available raw data. Another option: The sequencing provider often will run alignment and some default variant calling for you. Annotating and analysing these variants can be done on pretty much any computer, all with open source software. A SNP chip is even easier to deal with as the computational requirements are less.
Interpreting the results is a more manual process. Really depends on what you are interested in.
-
-
-
-
-
-
-
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Python Science and Data analysis related posts
- Fortran codes are causing problems
- Calculating weighted averages with numpy and Python!
- Libraries vs. Frameworks: Which is right for your next web project?
- Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide
- Solving a simple puzzle using SymPy
- How to Create a Pareto Chart 📐
-
NumPy VS gmpy - a user suggested alternative
2 projects | 2 Aug 2023
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Sep 2023
Index
What are some of the best open-source Science and Data analysis projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Pandas | 39,797 |
2 | NumPy | 24,540 |
3 | NetworkX | 13,177 |
4 | SciPy | 11,726 |
5 | Dask | 11,398 |
6 | SymPy | 11,317 |
7 | Numba | 8,913 |
8 | statsmodels | 8,866 |
9 | PyMC | 7,783 |
10 | orange | 4,259 |
11 | astropy | 3,921 |
12 | Biopython | 3,754 |
13 | blaze | 3,165 |
14 | fugue | 1,723 |
15 | Cubes | 1,490 |
16 | bcbio-nextgen | 949 |
17 | Neupy | 741 |
18 | NIPY | 701 |
19 | bccb | 575 |
20 | Bubbles | 450 |
21 | PyDy | 333 |
22 | harold | 163 |
23 | signac | 124 |