Datascience

Open-source projects categorized as Datascience

Top 23 Datascience Open-Source Projects

  • ds-cheatsheets

    List of Data Science Cheatsheets to rule the world

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

  • Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

    This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

  • Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
  • Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

  • Project mention: +10 Resources to Empower Women in Technology | dev.to | 2024-03-06

    I’ve been working in tech for more than five years. I started as a Data Scientist, and now I’m exploring and loving the DevRel 🥑 role for Taipy. Needless to say, evolving in the tech scene has been a ride full of ups, downs, and everything in between.

  • metaflow

    :rocket: Build and manage real-life ML, AI, and data science projects with ease!

  • Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05
  • machine_learning_complete

    A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

  • Mimesis

    Mimesis is a powerful Python library that empowers developers to generate massive amounts of synthetic data efficiently.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • panel

    Panel: The powerful data exploration & web app framework for Python (by holoviz)

  • Project mention: This Week In Python | dev.to | 2024-04-12

    panel – data exploration & web app framework for Python

  • OpenMetadata

    Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

  • Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25

    In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.

  • datascience

    Curated list of Python resources for data science.

  • sql-translator

    SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.

  • Project mention: Storybook GPT | dev.to | 2023-05-08

    I started to see more and more applications that use the OpenAI API and I wanted to try it out. One of these apps is this one made by Kate.

  • awesome-conformal-prediction

    A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.

  • Project mention: Dive Deep into Conformal Prediction with This Ultimate Resource Compilation | news.ycombinator.com | 2024-04-15
  • PyFunctional

    Python library for creating data pipelines with chain functional programming

  • Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24

    If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.

  • An-Introduction-to-Statistical-Learning

    This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

  • Fast-F1

    FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

  • Project mention: Consume Live Timing/Telemetry From API During Race | /r/F1Technical | 2023-05-28

    F1 broadcasts their live timing via the SignalR protocol. The endpoint itself is unauthenticated. You can look at FastF1’s implementation of the SignalR client and the respective endpoints which it connects to within the code documentation here FastF1 SignalR client

  • DataScienceR

    a curated list of R tutorials for Data Science, NLP and Machine Learning

  • ggstatsplot

    Enhancing {ggplot2} plots with statistical analysis 📊📣

  • openllmetry

    Open-source observability for your LLM application, based on OpenTelemetry

  • Project mention: Show HN: You don't need to adopt new tools for LLM observability | news.ycombinator.com | 2024-02-14

    So why should it be different when the app you're building happened to be using LLMs?

    So today we're open-sourcing OpenLLMetry-JS. It's an open protocol and SDK, based on OpenTelemetry, that provides traces and metrics for LLM JS/TS applications and can be connected to any of the 15+ tools that already support OpenTelemetry. Here's the repo: https://github.com/traceloop/openllmetry-js

    A few months ago we launched the python flavor here (https://news.ycombinator.com/item?id=37843907) and we've now built a compatible one for Node.js.

    Would love to hear your thoughts and opinions!

    Check it out -

    Docs: https://www.traceloop.com/docs/openllmetry/getting-started-t...

    Github:

  • vscode-jupyter

    VS Code Jupyter extension

  • Project mention: Multiple Notepad++ Flaws Let Attackers Execute Arbitrary Code | news.ycombinator.com | 2023-09-04

    https://github.com/microsoft/vscode/issues/4490

    It looks like there are a number of vscode extensions for recording macros:

    - https://www.google.com/search?q=vscode+macro+recorder

    - https://marketplace.visualstudio.com/search?term=Macro&targe...

    - the macro-commander README explains its JSON-based macro language. YAML might be easier to maintain than JSON. https://github.com/jeff-hykin/macro-commander#what-are-some-...

    For teams with multiple editors, you can specify workflow automation scripts with shell scripts or ci container/cmd YAML, and/or pre-commit.yml instead of with an IDE-specific tool.

    Isn't there native real-time collaboration functionality in vscode/vscodium that would be useful for a native macro recording feature? (Edit) Live Share can't be installed in vscodium. https://github.com/VSCodium/vscodium/issues/128

    Support for jupyter-collaboration Y.js CRDT could be added to vscode-jupyter and/or a more generic extension: "Support for real-time collaboration in the extension?" https://github.com/microsoft/vscode-jupyter/discussions/1293...

    jupyterlab/jupyter-collaboration:

  • CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

  • easystats

    :milky_way: The R easystats-project

  • code

    Compilation of R and Python programming codes on the Data Professor YouTube channel. (by dataprofessor)

  • streamlit-geospatial

    A multi-page streamlit app for geospatial

  • Project mention: how i can create a timelapse of a specfic region | /r/remotesensing | 2023-07-05
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Datascience related posts

Index

What are some of the best open-source Datascience projects? This list will help you:

Project Stars
1 ds-cheatsheets 12,570
2 ludwig 10,801
3 modin 9,476
4 Taipy 8,371
5 metaflow 7,586
6 machine_learning_complete 4,501
7 Mimesis 4,304
8 panel 4,192
9 OpenMetadata 4,100
10 datascience 4,071
11 sql-translator 3,966
12 awesome-conformal-prediction 3,381
13 PyFunctional 2,332
14 An-Introduction-to-Statistical-Learning 2,285
15 Fast-F1 2,178
16 DataScienceR 1,959
17 ggstatsplot 1,919
18 openllmetry 1,224
19 vscode-jupyter 1,219
20 CleverCSV 1,213
21 easystats 1,019
22 code 870
23 streamlit-geospatial 803

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com