Top 23 Science and Data analysis OpenSource Projects

Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
VBA is used for writing up scripts that will automate some process in Excel. VBA performance is incredibly slow and honestly, terrible. You're better off learning some programming (Python) and libraries that will allow you to manipulate/clean/data wrangle. Look into pandas.

NumPy
The fundamental package for scientific computing with Python.
What do you mean by uploads? If you mean additional libraries besides Python then, for control input you need the midi module from pygame and for audio output pyaudio. Other than that numpy, you can install these using pip.

Scout
Get performance insights in less than 4 minutes. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

PredictionIO
PredictionIO, a machine learning server for developers and ML engineers.

NetworkX
Network Analysis in Python
Project mention: [P] I made Communities: a library of clustering algorithms for network graphs (link in comments)  reddit.com/r/MachineLearning  20210222It would be nice that communities natively supports both networkx and igraph data structures.

Dask
Parallel computing with task scheduling
Project mention: Too much data to preprocess to work with pandas — is pyspark.sql a feasible alternative?  reddit.com/r/PySpark  20210225I haven't used it myself I have to admit, but I think dask could fit your workflow. Spark might add a little bit too much overhead if you're not used to it and you're not using a distributed system but of course it would also work.

SciPy
Scipy library main repository
Link: https://github.com/scipy/scipy/releases

SymPy
A computer algebra system written in pure Python
Project mention: Python Math Library made in 3 Days as a 14 yearold  libmaths  reddit.com/r/Python  20210223Now compare that to SymPy: https://github.com/sympy/sympy/blob/9e8f62e059d83178c1d8a1e19acac5473bdbf1c1/sympy/ntheory/primetest.py#L472L634

Numba
NumPy aware dynamic Python compiler using LLVM
The first thing I would do is write the code in a nonvectorized fashion to see where I could get rid of any unnecessary copying/allocating. Then you could rewrite the code using a more efficient sequence of vectorized operations, or you could JIT it using a library like numba

statsmodels
Statsmodels: statistical modeling and econometrics in Python
Project mention: [C] I have an MS in Statistics  how can I get better at coding?  reddit.com/r/statistics  20210104 
PyMC
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Zeppelin
Webbased notebook that enables datadriven, interactive data analytics and collaborative documents with SQL, Scala and more.

gonum
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
I'm not sure what exactly you are trying to accomplish, but there are already numeric packages https://github.com/gonum/gonum that has asm loops for the common stuff. And there's https://github.com/mmcloughlin/avo that makes working with assembly less painful.

BigDL
BigDL: Distributed Deep Learning Framework for Apache Spark

Breeze
Breeze is a numerical processing library for Scala.

Spark Notebook
Interactive and Reactive Data Science using Scala and Spark.

blaze
NumPy and Pandas interface to Big Data

astropy
Repository for the Astropy core package

orange
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
Project mention: Informatica per la SCIENZA, per un ignorante in materia.  reddit.com/r/ItalyInformatica  20210228 
Biopython
Official git repository for Biopython (originally converted from CVS)
You probably mean genetic engineering, which also uses a lot of software tools. The latest iteration, called synthetic biology, also relies heavily on computerassisted DNA design, cloning and modelling of gene expression networks. You may check out Biopython, the Synthetic Biology Open Language (SBOL), the GBA software, or CUBA for examples of software used in synbio.

Algebird
Abstract Algebra for Scala

Stats
A well tested and comprehensive Golang statistics library package with no dependencies.

Interactive Parallel Computing with IPython
Interactive Parallel Computing in Python

gonum/plot
A repository for plotting and visualizing data
Index
What are some of the best opensource Science and Data analysis projects? This list will help you:
Project  Stars  

1  Pandas  28,657 
2  NumPy  16,428 
3  PredictionIO  12,500 
4  NetworkX  8,731 
5  Dask  7,965 
6  SciPy  7,955 
7  SymPy  7,866 
8  Numba  6,153 
9  statsmodels  6,056 
10  PyMC  5,594 
11  Zeppelin  5,150 
12  gonum  4,617 
13  BigDL  3,702 
14  Breeze  3,221 
15  Spark Notebook  3,015 
16  blaze  2,928 
17  astropy  2,642 
18  orange  2,637 
19  Biopython  2,619 
20  Algebird  2,038 
21  Stats  1,922 
22  Interactive Parallel Computing with IPython  1,899 
23  gonum/plot  1,828 