Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge. Learn more →
Top 23 Python Science and Data analysis Projects
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much moreProject mention: Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide | dev.to | 2023-08-20
AWS Data Wrangler is a Python library that simplifies the process of interacting with various AWS services, built on top of some useful data tools and open-source projects such as Pandas, Apache Arrow and Boto3. It offers streamlined functions to connect to, retrieve, transform, and load data from AWS services, with a strong focus on Amazon S3.
The fundamental package for scientific computing with Python.Project mention: Calculating weighted averages with numpy and Python! | dev.to | 2023-08-22
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
Network Analysis in PythonProject mention: org-roam-pygraph: Build a graph of your org-roam collection for use in Python | /r/orgmode | 2023-05-07
org-roam-ui is a great interactive visualization tool, but its main use is visualization. The hope of this library is that it could be part of a larger graph analysis pipeline. The demo provides an example graph visualization, but what you choose to do with the resulting graph certainly isn't limited to that. See for example networkx.
SciPy library main repositoryProject mention: Fortran codes are causing problems | /r/rstats | 2023-09-13
Fortran codes have caused many problems for the Python package Scipy, and some of them are now being rewritten in C: e.g., https://github.com/scipy/scipy/pull/19121. Not only does R have many Fortran codes, there are also many R packages using Fortran codes: https://github.com/r-devel/r-svn, https://github.com/cran?q=&type=&language=fortran&sort=. Modern Fortran is a fine language but most legacy Fortran codes use the F77 style. When I update the R package quantreg, which uses many Fortran codes, I get a lot of warning messages. Not sure how the Fortran codes in the R ecosystem will be dealt with in the future, but they recently caused an issue in R due to the lack of compiler support for Fortran: https://blog.r-project.org/2023/08/23/will-r-work-on-64-bit-arm-windows/index.html. Some renowned packages like glmnet already have their Fortran codes rewritten in C/C++: https://cran.r-project.org/web/packages/glmnet/news/news.html
Parallel computing with task schedulingProject mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
A computer algebra system written in pure Python
NumPy aware dynamic Python compiler using LLVM
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
Statsmodels: statistical modeling and econometrics in PythonProject mention: statsmodels Release Candidate 0.14.0rc0 tagged | /r/Python | 2023-04-26
Bayesian Modeling in PythonProject mention: PYMC Release: v5.0.0 | news.ycombinator.com | 2022-12-12
🍊 :bar_chart: :bulb: Orange: Interactive data analysisProject mention: What exactly is AutoGPT? | /r/AutoGPT | 2023-06-12
Both tools are ripoffs of a data mining framework named Orange 3
Astronomy and astrophysics core libraryProject mention: [R] Astronomia ex machina: a history, primer and outlook on neural networks in astronomy | /r/MachineLearning | 2023-05-31
Official git repository for Biopython (originally converted from CVS)Project mention: Invitación a proyecto - Biopython en Español | /r/devsarg | 2023-07-23
NumPy and Pandas interface to Big Data
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysisProject mention: Deep Sleep May Be the Best Defense Against Alzheimer’s | news.ycombinator.com | 2023-05-22
Re WGS there are a lot of well established tool chains that are FLOSS (eg https://github.com/bcbio/bcbio-nextgen). You could run alignment and variant calling on a beefy workstation. A laptop would potentially work. Easy to test this with publicly available raw data. Another option: The sequencing provider often will run alignment and some default variant calling for you. Annotating and analysing these variants can be done on pretty much any computer, all with open source software. A SNP chip is even easier to deal with as the computational requirements are less.
Interpreting the results is a more manual process. Really depends on what you are interested in.
NeuPy is a Tensorflow based python library for prototyping and building neural networks
Workflows and interfaces for neuroimaging packages
Incubator for useful bioinformatics code, primarily in Python and R
[NOT MAINTAINED] Bubbles – Python ETL framework (by Stiivi)
Multibody dynamics tool kit.
An open-source systems and controls toolbox for Python3
Manage large and heterogeneous data spaces on the file system.
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Python Science and Data analysis related posts
Fortran codes are causing problems
2 projects | /r/rstats | 13 Sep 2023
Calculating weighted averages with numpy and Python!
1 project | dev.to | 22 Aug 2023
Libraries vs. Frameworks: Which is right for your next web project?
1 project | dev.to | 21 Aug 2023
Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide
5 projects | dev.to | 20 Aug 2023
Solving a simple puzzle using SymPy
1 project | news.ycombinator.com | 14 Aug 2023
How to Create a Pareto Chart 📐
2 projects | dev.to | 12 Aug 2023
NumPy VS gmpy - a user suggested alternative
2 projects | 2 Aug 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Sep 2023
What are some of the best open-source Science and Data analysis projects in Python? This list will help you: