Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free. Learn more →
Top 23 Python Data Science Projects
Deep Learning for humansProject mention: Can someone explain how keras code gets into the Tensorflow package? | /r/tensorflow | 2023-07-24
I'm guessing the "real" keras code is coming from the keras repository. Is that a correct assumption? How does that version of Keras get there? If I wanted to write my own activation layer next to ELU, where exactly would I do that?
scikit-learn: machine learning in PythonProject mention: Transformers as Support Vector Machines | news.ycombinator.com | 2023-09-03
It looks like you've been the victim of some misinformation. As Dr_Birdbrain said, an SVM is a convex problem with unique global optimum. sklearn.SVC relies on libsvm which initializes the weights to 0 . The random state is only used to shuffle the data to make probability estimates with Platt scaling . Of the random_state parameter, the sklearn documentation for SVC  says
Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary.
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much moreProject mention: Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide | dev.to | 2023-08-20
AWS Data Wrangler is a Python library that simplifies the process of interacting with various AWS services, built on top of some useful data tools and open-source projects such as Pandas, Apache Arrow and Boto3. It offers streamlined functions to connect to, retrieve, transform, and load data from AWS services, with a strong focus on Amazon S3.
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.Project mention: Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Custom Models | news.ycombinator.com | 2023-08-11
Training times for GSM8k are mentioned here: https://github.com/ray-project/ray/tree/master/doc/source/te...
Streamlit — A faster way to build and share data apps.Project mention: Show HN: Zero-dependency Java framework out of beta | news.ycombinator.com | 2023-09-25
The 'batteries included' space is definitely a market. For example https://streamlit.io is wildly popular with data teams for quickly making a pre-styled, usable enough web UI to put on top of some model, with controls that are automatically reactive. Those ppl have zero interest in fiddling with modular systems or spending time optimizing and scaling web apps.
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: Retrieval Augmented Generation (RAG): How To Get AI Models Learn Your Data & Give You Answers | dev.to | 2023-09-18
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
Deep learning framework to train, deploy, and ship AI products Lightning fast.Project mention: Best practice for saving logits/activation values of model in PyTorch Lightning | /r/deeplearning | 2023-07-19
I've been wondering on what is the recommended method of saving logits/activations using PyTorch Lightning. I've looked at Callbacks, Loggers and ModelHooks but none of the use-cases seem to be for this kind of activity (even if I were to create my own custom variants of each utility). The ModelCheckpoint Callback in its utility makes me feel like custom Callbacks would be the way to go but I'm not quite sure. This closed GitHub issue does address my issue to some extent.
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.Project mention: Tutorials on creating primitive ML algorithms from scratch? | /r/learnmachinelearning | 2023-01-24
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!Project mention: Gradio sharable link expires too soon ( 30 mins to 1 hour, instead of lasting 72 hours ) | /r/StableDiffusion | 2023-06-10
I found an issue on gradio github but looks like it's closed so I am not sure if it's still a common issue or only I am facing it due to certain settings/absence of a fix. ( https://github.com/gradio-app/gradio/issues/3060 )
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.Project mention: which book to chose for deep learning :lan Goodfellow or francois chollet | /r/learnmachinelearning | 2023-04-07
matplotlib: plotting with PythonProject mention: Tkinter, PyGame windows too large on Mac | /r/learnpython | 2023-06-29
as suggested here.
Best Practices on Recommendation SystemsProject mention: My kernel dies when I fit my LightFm model from Microsoft Recommenders | /r/Jupyter | 2023-06-16
Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.Project mention: The new pdbp (Pdb+) Python debugger! | dev.to | 2023-08-02
If you’re already using ipython, this isn’t a problem because you’ll already need to download most of these dependencies anyway. But if you’re not using ipython… you’ll still need to download those dependencies.
Topic Modelling for HumansProject mention: Aggregating news from different sources | /r/learnprogramming | 2023-07-08
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.Project mention: Ask HN: How to get back into AI? | news.ycombinator.com | 2022-12-10
For Python, here's a nice compilation: https://github.com/ml-tooling/best-of-ml-python/blob/main/RE...
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.Project mention: Filter Pruning for PyTorch | /r/deeplearning | 2023-04-13
The easiest way to build, run, and monitor data pipelines at scale.Project mention: self hosted Alternative to easycron.com? | /r/selfhosted | 2022-12-30
🦉 Data Version Control | Git for Data & Models | ML Experiments ManagementProject mention: Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations | dev.to | 2023-06-06
DVC (Data Version Control):
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.Project mention: Data exploration is not dead | news.ycombinator.com | 2023-06-24
Statistical data visualization in PythonProject mention: Best Portfolio Projects for Data Science | dev.to | 2023-09-19
Low-code framework for building custom LLMs, neural networks, and other AI modelsProject mention: Python projects with best practices on Github? | /r/Python | 2023-02-14
Two random examples I found from 30 seconds of googling: Here’s Netflix using it in their crisis management tool, and here’s Uber using it in their deep learning framework.
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
Python Data Science related posts
Facebook Prophet: library for generating forecasts from any time series data
7 projects | news.ycombinator.com | 26 Sep 2023
Orange: Open-source machine learning and data visualization
1 project | news.ycombinator.com | 25 Sep 2023
Nomad: Run any code on an EC2 instance, instantly
1 project | news.ycombinator.com | 24 Sep 2023
plotly-resampler: NEW Data - star count:800.0
1 project | /r/algoprojects | 23 Sep 2023
plotly-resampler: NEW Data - star count:800.0
1 project | /r/algoprojects | 22 Sep 2023
Prism: the easiest way to create robust data workflows. Accessible via CLI
1 project | /r/coolgithubprojects | 21 Sep 2023
Stop LLM/GenAI hallucination fast: Serverless Kendra RAG with GO
2 projects | dev.to | 20 Sep 2023
A note from our sponsor - Mergify
blog.mergify.com | 28 Sep 2023
What are some of the best open-source Data Science projects in Python? This list will help you: