Top 23 Science and Data analysis Open-Source Projects
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
VBA is used for writing up scripts that will automate some process in Excel. VBA performance is incredibly slow and honestly, terrible. You're better off learning some programming (Python) and libraries that will allow you to manipulate/clean/data wrangle. Look into pandas.
-
NumPy
The fundamental package for scientific computing with Python.
What do you mean by uploads? If you mean additional libraries besides Python then, for control input you need the midi module from pygame and for audio output pyaudio. Other than that numpy, you can install these using pip.
-
Scout
Get performance insights in less than 4 minutes. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
-
PredictionIO
PredictionIO, a machine learning server for developers and ML engineers.
-
NetworkX
Network Analysis in Python
Project mention: [P] I made Communities: a library of clustering algorithms for network graphs (link in comments) | reddit.com/r/MachineLearning | 2021-02-22It would be nice that communities natively supports both networkx and igraph data structures.
-
Dask
Parallel computing with task scheduling
Project mention: Too much data to preprocess to work with pandas — is pyspark.sql a feasible alternative? | reddit.com/r/PySpark | 2021-02-25I haven't used it myself I have to admit, but I think dask could fit your workflow. Spark might add a little bit too much overhead if you're not used to it and you're not using a distributed system but of course it would also work.
-
SciPy
Scipy library main repository
Link: https://github.com/scipy/scipy/releases
-
SymPy
A computer algebra system written in pure Python
Project mention: Python Math Library made in 3 Days as a 14 year-old - libmaths | reddit.com/r/Python | 2021-02-23Now compare that to SymPy: https://github.com/sympy/sympy/blob/9e8f62e059d83178c1d8a1e19acac5473bdbf1c1/sympy/ntheory/primetest.py#L472-L634
-
Numba
NumPy aware dynamic Python compiler using LLVM
The first thing I would do is write the code in a non-vectorized fashion to see where I could get rid of any unnecessary copying/allocating. Then you could rewrite the code using a more efficient sequence of vectorized operations, or you could JIT it using a library like numba
-
statsmodels
Statsmodels: statistical modeling and econometrics in Python
Project mention: [C] I have an MS in Statistics - how can I get better at coding? | reddit.com/r/statistics | 2021-01-04 -
PyMC
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano
-
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
-
gonum
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
I'm not sure what exactly you are trying to accomplish, but there are already numeric packages https://github.com/gonum/gonum that has asm loops for the common stuff. And there's https://github.com/mmcloughlin/avo that makes working with assembly less painful.
-
BigDL
BigDL: Distributed Deep Learning Framework for Apache Spark
-
Breeze
Breeze is a numerical processing library for Scala.
-
Spark Notebook
Interactive and Reactive Data Science using Scala and Spark.
-
blaze
NumPy and Pandas interface to Big Data
-
astropy
Repository for the Astropy core package
-
orange
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
Project mention: Informatica per la SCIENZA, per un ignorante in materia. | reddit.com/r/ItalyInformatica | 2021-02-28 -
Biopython
Official git repository for Biopython (originally converted from CVS)
You probably mean genetic engineering, which also uses a lot of software tools. The latest iteration, called synthetic biology, also relies heavily on computer-assisted DNA design, cloning and modelling of gene expression networks. You may check out Biopython, the Synthetic Biology Open Language (SBOL), the GBA software, or CUBA for examples of software used in synbio.
-
Algebird
Abstract Algebra for Scala
-
Stats
A well tested and comprehensive Golang statistics library package with no dependencies.
-
Interactive Parallel Computing with IPython
Interactive Parallel Computing in Python
-
gonum/plot
A repository for plotting and visualizing data
Index
What are some of the best open-source Science and Data analysis projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Pandas | 28,657 |
2 | NumPy | 16,428 |
3 | PredictionIO | 12,500 |
4 | NetworkX | 8,731 |
5 | Dask | 7,965 |
6 | SciPy | 7,955 |
7 | SymPy | 7,866 |
8 | Numba | 6,153 |
9 | statsmodels | 6,056 |
10 | PyMC | 5,594 |
11 | Zeppelin | 5,150 |
12 | gonum | 4,617 |
13 | BigDL | 3,702 |
14 | Breeze | 3,221 |
15 | Spark Notebook | 3,015 |
16 | blaze | 2,928 |
17 | astropy | 2,642 |
18 | orange | 2,637 |
19 | Biopython | 2,619 |
20 | Algebird | 2,038 |
21 | Stats | 1,922 |
22 | Interactive Parallel Computing with IPython | 1,899 |
23 | gonum/plot | 1,828 |