Top 23 Science and Data analysis Open-Source Projects
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much moreProject mention: Capture tabular data from docho.com | reddit.com/r/learnpython | 2021-04-18
The fundamental package for scientific computing with Python.Project mention: How to replace an integer with a letter in a dictionary | reddit.com/r/learnpython | 2021-04-20
Scout APM - Leading-edge performance monitoring starting at $39/month. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
Network Analysis in PythonProject mention: Is there another way to find all the cliques in a graph (dictionary)? | reddit.com/r/learnpython | 2021-04-06
Parallel computing with task schedulingProject mention: Why is Python popular despite being accused of being slow? | reddit.com/r/programming | 2021-04-16
Not everyone has the same "parallelism" needs. I have used mpi4py to distribute scientific computations using numpy over thousands of cores on hundreds of servers with much less effort than doing the same thing in C / C++ and almost no performance penalty (I could batch my data in big enough chunks). Today there are higher level distributed computing packages like dask that are even easier to use.
Scipy library main repositoryProject mention: That took a wild turn | reddit.com/r/ProgrammerHumor | 2021-04-15
A computer algebra system written in pure PythonProject mention: Is the capitalization of sp.symbols vs sp.Symbol intentional in sympy? | reddit.com/r/learnpython | 2021-04-01
symbols is a function
NumPy aware dynamic Python compiler using LLVMProject mention: The best description of JS I've ever seen. | reddit.com/r/ProgrammerHumor | 2021-04-21
That's probably because it was using a JIT compiler, probably V8, which is fantastic don't get me wrong but its apples to oranges. Give python a hand with something like numba and it'll probably come out that python is more even.
Statsmodels: statistical modeling and econometrics in PythonProject mention: [C] I have an MS in Statistics - how can I get better at coding? | reddit.com/r/statistics | 2021-01-04
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.Project mention: Is there a way to collaborate in real-time for Jupyter Notebooks? | reddit.com/r/learnpython | 2021-03-21
Check out Zeppelin. It's similar to Jupyter and allows real-time editing by multiple users. https://zeppelin.apache.org/
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and moreProject mention: Go+: Go designed for data science | news.ycombinator.com | 2021-03-27
Apart from Gonum numerical libraries, I haven't found specific data science related Go libraries in my search for it for some hobby projects when compared to Python ecosystem.
Interestingly Prose A Go library for text processing yielded better results for named-entity extraction when compared to NLTK in my tests in terms of accuracy and obviously performance.
Perhaps Go is not being applied enough in the Data Science/ML and for fields where it's applied (Network) Math in the standard library seems to be sufficient.
BigDL: Distributed Deep Learning Framework for Apache SparkProject mention: Machine learning on JVM | reddit.com/r/scala | 2021-04-05
Intel BigDL for Spark which again is for Spark.
Breeze is a numerical processing library for Scala.Project mention: Machine learning on JVM | reddit.com/r/scala | 2021-04-05
I haven't checked in on this project in a long time, but Breeze is something akin to NumPy/SciPy.
Interactive and Reactive Data Science using Scala and Spark.
NumPy and Pandas interface to Big Data
🍊 :bar_chart: :bulb: Orange: Interactive data analysisProject mention: No-code vs Visual Programming | reddit.com/r/nocode | 2021-03-12
I am using visual programming tools that overlap with the no-code concept such as: KNIME and Orange. To visualize the results, I use connectors with platforms like DataStudio or Google AppSheet.
Official git repository for Biopython (originally converted from CVS)Project mention: Need help with Biopython examples | reddit.com/r/bioinformatics | 2021-04-16
You can then copy the contents of the file directly ( press Ctrl + A on the webpage, and then Ctrl + C on the text editor you are using ). You can also download the file using a command line tool, such as wget or curl if you are familiar with those. For example, if I wanted to download the ls_orchid.gbk file, I would find the raw version as above and simply open a terminal and type:
Repository for the Astropy core packageProject mention: Q&A: Month of April | reddit.com/r/Andromeda321 | 2021-04-06
Cool, that sounds like a great place to start in terms of specialties! :) When in doubt for astronomy and coding, I advise people to know Python and the more the better, because that's really become the default in astronomical software in recent years. Poke around astropy a bit too while you're at it.
Abstract Algebra for ScalaProject mention: Symbolics.jl: A Modern Computer Algebra System for a Modern Language | news.ycombinator.com | 2021-03-05
Hey, I have... I'm a co-author of Algebird, which has many ideas that I'd pull over.
I'm hoping to introduce Clojure's "spec" or "schema" libraries so that the types at play can at least be inspectable inside the system. In a fully typed language, I'd implement the extensible generics as typeclasses.
I suspect it would make it quite a bit tougher (at least in the approach I'm imagining) for folks to write new generic functions, due to many type constructors...
On the other hand, the complexity is there, even if you don't write it down!
It would be a big project, and a worthy effort, to write down types for everything in SICM.
A well tested and comprehensive Golang statistics library package with no dependencies. (by montanaflynn)
Interactive Parallel Computing in Python
A repository for plotting and visualizing dataProject mention: Go matplotlib libary? | reddit.com/r/golang | 2021-04-01
Gonum Plot is alright but definitely not as mature.Link
Powerful new number types and numeric abstractions for Scala.
What are some of the best open-source Science and Data analysis projects? This list will help you:
|21||Interactive Parallel Computing with IPython||1,939|