Datascience

Open-source projects categorized as Datascience Edit details

Top 23 Datascience Open-Source Projects

  • ds-cheatsheets

    List of Data Science Cheatsheets to rule the world

    Project mention: ⚙️ Data Science Cheat Sheets: A collection of cheat sheets for #DataScience and problem solving. h/t @Sauain | reddit.com/r/policerewired | 2021-10-01
  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

    Project mention: Modern Python Performance Considerations | news.ycombinator.com | 2022-05-05
  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • metaflow

    :rocket: Build and manage real-life data science projects with ease!

    Project mention: AWS Summit 2022 Australia and New Zealand - Day 2, AI/ML Edition | dev.to | 2022-05-20

    As a result of their new DS framework (based on a Metaflow - a DS framework built at Netflix and AWS SageMaker Pipelines), they were able to free up their DS resources so that Software Developers were now trained and equipped to tackle their normal DS projects, at a ratio of 70% DS/ML work was now completed by developers. This leaves the 30% meatier and more difficult problems for the Data Scientists to tackle.

  • Mimesis

    Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.

  • datascience

    Curated list of Python resources for data science.

    Project mention: Datascience Libraries for Python | news.ycombinator.com | 2021-11-13
  • PyFunctional

    Python library for creating data pipelines with chain functional programming

    Project mention: Kotlin/Java/Javascript/Scala users do you miss the ability to chain functional operators in Python? | reddit.com/r/learnpython | 2022-06-05

    I often endup using [pyfunctional](https://github.com/EntilZha/PyFunctional) which gives the ability of using chained functional operators, but I am still kind of sad this is not a builtin solution in Python (please note I am not involved in the pyfunctional project and I do not know the author)

  • An-Introduction-to-Statistical-Learning

    This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • DataScienceR

    a curated list of R tutorials for Data Science, NLP and Machine Learning

    Project mention: Python vs Matlab vs R | reddit.com/r/GradSchool | 2022-02-12
  • ggstatsplot

    Enhancing `{ggplot2}` plots with statistical analysis 📊🎨📣

    Project mention: Better plots for Statistics display - ggstatsplot | dev.to | 2021-11-05

    The link to the github repo is ggstatsplot.

  • easystats

    :milky_way: The R easystats-project

    Project mention: My recommended R packages / functions for behavioral researchers who deal with 2X2 experiments | reddit.com/r/rstats | 2022-05-03

    I'd also recommend the whole easystats suite of packages. They make model post-processing much easier. It includes a more intuitive (but more limited, imo) replacement for emmeans: modelbased.

  • Vegas

    The missing MatPlotLib for Scala + Spark (by vegas-viz)

  • vscode-jupyter

    VS Code Jupyter extension

    Project mention: Ask HN: Are there any good Diff tools for Jupyter Notebooks? | news.ycombinator.com | 2022-05-22

    I wish for a simple option in VS Code: On close of a Jupyter Notebook clear its output. Or something that separate the display of the output from the saved file (Still `ipnyb` file). See [1].

    [1]: https://github.com/microsoft/vscode-jupyter/issues/9514

  • code

    Compilation of R and Python programming codes on the Data Professor YouTube channel. (by dataprofessor)

    Project mention: An "Improvement Prediction" error in R Shiny | reddit.com/r/learnprogramming | 2022-06-04

    The following below is my code (I did reference DataProfessor's code here while making my code). Not really sure what my error is and how to fix.

  • awesome-conformal-prediction

    A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD theses, articles and open-source libraries.

    Project mention: awesome-conformal-prediction: A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD theses, articles and open-source libraries. | reddit.com/r/u_TsukiZombina | 2022-01-18
  • krangl

    krangl is a {K}otlin DSL for data w{rangl}ing

    Project mention: 2,900 page Manual about Pandas [pdf] | news.ycombinator.com | 2021-08-07

    > And what's the alternative, Excel?

    Take this with a grain of salt from someone who needs data manipulation occasionally every now and then (as opposed to being a full time number-cruncher, data-scientist, statistician etc.), using krangl[1] for Kotlin has been a great experience.

    I was drawn to this library because I use Kotlin in my dayjob for backend development, but I love how well Kotlin's succinct syntax & features like extension functions lends itself to data transformation & ETL kind of use cases.

    Also it doesn't hurt that JVM has a plethora of libraries available for things like DB access, plotting, etc.

    I am sure that Pandas has many features I am unaware of, and for a lot of people the high-ish startup time can be a deterrant, but for most of my day to day data munging the combination of jbang, krangl & kravis has been a pretty good fit.

    [1] https://github.com/holgerbrandl/krangl

  • socios-brasil

    Captura os dados de sócios das empresas brasileiras na Receita Federal e exporta para um formato legível por humanos

  • DGFraud

    A Deep Graph-based Toolbox for Fraud Detection

    Project mention: DGFraud: NEW Extended Research - star count:462.0 | reddit.com/r/algoprojects | 2022-07-02
  • tech.ml.dataset

    A Clojure high performance data processing system

    Project mention: Why Clojure is not widely adopted like mainstream languages? | reddit.com/r/Clojure | 2022-06-06
  • mlcraft

    Low-code metrics store, modern open-source alternative to Looker

    Project mention: Ыelf-hosted alternative to Looker | reddit.com/r/selfhosted | 2021-09-27
  • objectiv-analytics

    Open-source product analytics infrastructure for data teams that want full control. Built for high quality data collection and ready to use for advanced analytics & ML.

    Project mention: Analytics tracking with a strict event taxonomy, validation & end-to-end testing | reddit.com/r/Frontend | 2022-06-29

    Browse our repo here: https://github.com/objectiv/objectiv-analytics

  • datatableton

    💯 datatable exercises

  • wikipedia-mirror

    🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

    Project mention: I made the prepper version of the Internet | reddit.com/r/preppers | 2022-03-28

    Haha well good news is that if you have a Wikipedia Mirror (they have instructions here -> https://github.com/pirate/wikipedia-mirror) it’s continually syncing with Wikipedia as a whole so if/when SHTF your copy of Wikipedia will be up to date as of the moment there’s power loss 👌, from there using u/UnsignedMark ’s concept network you / others could keep Wikipedia going pending only electricity. On that note SCADA + Solar Panels go brrr hahahaha

  • ocaml-jupyter

    An OCaml kernel for Jupyter (IPython) notebook

    Project mention: Come aggiungere kernel ocaml a Jupyter Notebook? | reddit.com/r/ItalyInformatica | 2021-10-17
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-07-02.

Datascience related posts

Index

What are some of the best open-source Datascience projects? This list will help you:

Project Stars
1 ds-cheatsheets 10,661
2 modin 7,542
3 metaflow 5,749
4 Mimesis 3,639
5 datascience 3,360
6 PyFunctional 2,039
7 An-Introduction-to-Statistical-Learning 1,978
8 DataScienceR 1,788
9 ggstatsplot 1,529
10 easystats 753
11 Vegas 722
12 vscode-jupyter 654
13 code 633
14 awesome-conformal-prediction 541
15 krangl 533
16 socios-brasil 512
17 DGFraud 465
18 tech.ml.dataset 452
19 mlcraft 338
20 objectiv-analytics 307
21 datatableton 240
22 wikipedia-mirror 228
23 ocaml-jupyter 221
Find remote jobs at our new job board 99remotejobs.com. There are 2 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com