Top 23 Datascience Open-Source Projects
-
Project mention: ⚙️ Data Science Cheat Sheets: A collection of cheat sheets for #DataScience and problem solving. h/t @Sauain | reddit.com/r/policerewired | 2021-10-01
-
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
Project mention: AWS Summit 2022 Australia and New Zealand - Day 2, AI/ML Edition | dev.to | 2022-05-20
As a result of their new DS framework (based on a Metaflow - a DS framework built at Netflix and AWS SageMaker Pipelines), they were able to free up their DS resources so that Software Developers were now trained and equipped to tackle their normal DS projects, at a ratio of 70% DS/ML work was now completed by developers. This leaves the 30% meatier and more difficult problems for the Data Scientists to tackle.
-
Mimesis
Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.
-
-
Project mention: Kotlin/Java/Javascript/Scala users do you miss the ability to chain functional operators in Python? | reddit.com/r/learnpython | 2022-06-05
I often endup using [pyfunctional](https://github.com/EntilZha/PyFunctional) which gives the ability of using chained functional operators, but I am still kind of sad this is not a builtin solution in Python (please note I am not involved in the pyfunctional project and I do not know the author)
-
An-Introduction-to-Statistical-Learning
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
-
-
The link to the github repo is ggstatsplot.
-
Project mention: My recommended R packages / functions for behavioral researchers who deal with 2X2 experiments | reddit.com/r/rstats | 2022-05-03
I'd also recommend the whole easystats suite of packages. They make model post-processing much easier. It includes a more intuitive (but more limited, imo) replacement for emmeans: modelbased.
-
-
Project mention: Ask HN: Are there any good Diff tools for Jupyter Notebooks? | news.ycombinator.com | 2022-05-22
I wish for a simple option in VS Code: On close of a Jupyter Notebook clear its output. Or something that separate the display of the output from the saved file (Still `ipnyb` file). See [1].
[1]: https://github.com/microsoft/vscode-jupyter/issues/9514
-
code
Compilation of R and Python programming codes on the Data Professor YouTube channel. (by dataprofessor)
Project mention: An "Improvement Prediction" error in R Shiny | reddit.com/r/learnprogramming | 2022-06-04The following below is my code (I did reference DataProfessor's code here while making my code). Not really sure what my error is and how to fix.
-
awesome-conformal-prediction
A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD theses, articles and open-source libraries.
Project mention: awesome-conformal-prediction: A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD theses, articles and open-source libraries. | reddit.com/r/u_TsukiZombina | 2022-01-18 -
> And what's the alternative, Excel?
Take this with a grain of salt from someone who needs data manipulation occasionally every now and then (as opposed to being a full time number-cruncher, data-scientist, statistician etc.), using krangl[1] for Kotlin has been a great experience.
I was drawn to this library because I use Kotlin in my dayjob for backend development, but I love how well Kotlin's succinct syntax & features like extension functions lends itself to data transformation & ETL kind of use cases.
Also it doesn't hurt that JVM has a plethora of libraries available for things like DB access, plotting, etc.
I am sure that Pandas has many features I am unaware of, and for a lot of people the high-ish startup time can be a deterrant, but for most of my day to day data munging the combination of jbang, krangl & kravis has been a pretty good fit.
-
socios-brasil
Captura os dados de sócios das empresas brasileiras na Receita Federal e exporta para um formato legível por humanos
-
Project mention: DGFraud: NEW Extended Research - star count:462.0 | reddit.com/r/algoprojects | 2022-07-02
-
Project mention: Why Clojure is not widely adopted like mainstream languages? | reddit.com/r/Clojure | 2022-06-06
-
-
objectiv-analytics
Open-source product analytics infrastructure for data teams that want full control. Built for high quality data collection and ready to use for advanced analytics & ML.
Project mention: Analytics tracking with a strict event taxonomy, validation & end-to-end testing | reddit.com/r/Frontend | 2022-06-29Browse our repo here: https://github.com/objectiv/objectiv-analytics
-
-
wikipedia-mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
Haha well good news is that if you have a Wikipedia Mirror (they have instructions here -> https://github.com/pirate/wikipedia-mirror) it’s continually syncing with Wikipedia as a whole so if/when SHTF your copy of Wikipedia will be up to date as of the moment there’s power loss 👌, from there using u/UnsignedMark ’s concept network you / others could keep Wikipedia going pending only electricity. On that note SCADA + Solar Panels go brrr hahahaha
-
Project mention: Come aggiungere kernel ocaml a Jupyter Notebook? | reddit.com/r/ItalyInformatica | 2021-10-17
Datascience related posts
- Analytics tracking with a strict event taxonomy, validation & end-to-end testing
- Introducing Objectiv: Self-hosted product analytics infrastructure
- Open-source product analytics infrastructure
- Open-source product analytics infrastructure
- Open-source product analytics infrastructure. For data teams that want full control. Built for high quality data collection, ready to use for advanced analytics & ML.
- Kotlin/Java/Javascript/Scala users do you miss the ability to chain functional operators in Python?
- Hacker News top posts: May 20, 2022
Index
What are some of the best open-source Datascience projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ds-cheatsheets | 10,661 |
2 | modin | 7,542 |
3 | metaflow | 5,749 |
4 | Mimesis | 3,639 |
5 | datascience | 3,360 |
6 | PyFunctional | 2,039 |
7 | An-Introduction-to-Statistical-Learning | 1,978 |
8 | DataScienceR | 1,788 |
9 | ggstatsplot | 1,529 |
10 | easystats | 753 |
11 | Vegas | 722 |
12 | vscode-jupyter | 654 |
13 | code | 633 |
14 | awesome-conformal-prediction | 541 |
15 | krangl | 533 |
16 | socios-brasil | 512 |
17 | DGFraud | 465 |
18 | tech.ml.dataset | 452 |
19 | mlcraft | 338 |
20 | objectiv-analytics | 307 |
21 | datatableton | 240 |
22 | wikipedia-mirror | 228 |
23 | ocaml-jupyter | 221 |
Are you hiring? Post a new remote job listing for free.