Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Datascience Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
machine_learning_complete
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
-
Mimesis
Mimesis is a powerful Python library that empowers developers to generate massive amounts of synthetic data efficiently.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
-
sql-translator
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
-
awesome-conformal-prediction
A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.
-
An-Introduction-to-Statistical-Learning
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.
-
Fast-F1
FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry
-
CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
-
code
Compilation of R and Python programming codes on the Data Professor YouTube channel. (by dataprofessor)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.
questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?
Would love to see more progress toward this area!
I’ve been working in tech for more than five years. I started as a Data Scientist, and now I’m exploring and loving the DevRel 🥑 role for Taipy. Needless to say, evolving in the tech scene has been a ride full of ups, downs, and everything in between.
panel – data exploration & web app framework for Python
Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.
I started to see more and more applications that use the OpenAI API and I wanted to try it out. One of these apps is this one made by Kate.
Project mention: Dive Deep into Conformal Prediction with This Ultimate Resource Compilation | news.ycombinator.com | 2024-04-15
Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.
F1 broadcasts their live timing via the SignalR protocol. The endpoint itself is unauthenticated. You can look at FastF1’s implementation of the SignalR client and the respective endpoints which it connects to within the code documentation here FastF1 SignalR client
Project mention: Show HN: You don't need to adopt new tools for LLM observability | news.ycombinator.com | 2024-02-14So why should it be different when the app you're building happened to be using LLMs?
So today we're open-sourcing OpenLLMetry-JS. It's an open protocol and SDK, based on OpenTelemetry, that provides traces and metrics for LLM JS/TS applications and can be connected to any of the 15+ tools that already support OpenTelemetry. Here's the repo: https://github.com/traceloop/openllmetry-js
A few months ago we launched the python flavor here (https://news.ycombinator.com/item?id=37843907) and we've now built a compatible one for Node.js.
Would love to hear your thoughts and opinions!
Check it out -
Docs: https://www.traceloop.com/docs/openllmetry/getting-started-t...
Github:
Project mention: Multiple Notepad++ Flaws Let Attackers Execute Arbitrary Code | news.ycombinator.com | 2023-09-04https://github.com/microsoft/vscode/issues/4490
It looks like there are a number of vscode extensions for recording macros:
- https://www.google.com/search?q=vscode+macro+recorder
- https://marketplace.visualstudio.com/search?term=Macro&targe...
- the macro-commander README explains its JSON-based macro language. YAML might be easier to maintain than JSON. https://github.com/jeff-hykin/macro-commander#what-are-some-...
For teams with multiple editors, you can specify workflow automation scripts with shell scripts or ci container/cmd YAML, and/or pre-commit.yml instead of with an IDE-specific tool.
Isn't there native real-time collaboration functionality in vscode/vscodium that would be useful for a native macro recording feature? (Edit) Live Share can't be installed in vscodium. https://github.com/VSCodium/vscodium/issues/128
Support for jupyter-collaboration Y.js CRDT could be added to vscode-jupyter and/or a more generic extension: "Support for real-time collaboration in the extension?" https://github.com/microsoft/vscode-jupyter/discussions/1293...
jupyterlab/jupyter-collaboration:
Datascience related posts
- Dive Deep into Conformal Prediction with This Ultimate Resource Compilation
- +10 Resources to Empower Women in Technology
- Show HN: Building data and AI apps, an alternative to Streamlit
- Our open-source project for building AI / Data full-stack apps got funded! 🎉 🎉
- Plotting 1,000,000 points on a webpage using only Python
- Forecasts need to have error bars
- Taipy for Data and AI algos web apps building
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source Datascience projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ds-cheatsheets | 12,570 |
2 | ludwig | 10,801 |
3 | modin | 9,476 |
4 | Taipy | 8,371 |
5 | metaflow | 7,586 |
6 | machine_learning_complete | 4,501 |
7 | Mimesis | 4,304 |
8 | panel | 4,192 |
9 | OpenMetadata | 4,100 |
10 | datascience | 4,071 |
11 | sql-translator | 3,966 |
12 | awesome-conformal-prediction | 3,381 |
13 | PyFunctional | 2,332 |
14 | An-Introduction-to-Statistical-Learning | 2,285 |
15 | Fast-F1 | 2,178 |
16 | DataScienceR | 1,959 |
17 | ggstatsplot | 1,919 |
18 | openllmetry | 1,224 |
19 | vscode-jupyter | 1,219 |
20 | CleverCSV | 1,213 |
21 | easystats | 1,019 |
22 | code | 870 |
23 | streamlit-geospatial | 803 |
Sponsored