Top 23 Python Data Science Projects
Deep Learning for humansProject mention: [Project] I'm trying to implement StyleGAN2 in Keras to better understand its structure and just AAAAAAAA | reddit.com/r/learnmachinelearning | 2021-05-17
scikit-learn: machine learning in PythonProject mention: Is there a way to map cluster centers back to a dataframe? | reddit.com/r/learnpython | 2021-05-19
To avoid the issue with convergence (and the discrepancy between the labels_ and cluster_centers_), you can set tol=0, though this can of course lead to issues if convergence is a problem. There was an issue about it here. Assuming it's converged, then the order is fine.
Scout APM - Leading-edge performance monitoring starting at $39/month. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
Apache Superset is a Data Visualization and Data Exploration PlatformProject mention: Jupyter notebooks for dashboarding? | reddit.com/r/BusinessIntelligence | 2021-06-13
Give a try to apache superset
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.Project mention: Beginner in Python for Data Science | reddit.com/r/learnpython | 2020-12-27
data science ipython notebooks
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: Resume Advice Thread - June 08, 2021 | reddit.com/r/cscareerquestions | 2021-06-08
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.Project mention: Ray 1.4.0 | news.ycombinator.com | 2021-06-08
Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.Project mention: A resource for looking 'under the hood' of multi-processing in python | reddit.com/r/learnpython | 2021-04-19
I stand corrected... Apparently, it does matter, if you are on Windows... https://github.com/ipython/ipython/issues/4698#issuecomment-30605308
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Streamlit — The fastest way to build data apps in PythonProject mention: Jupyter notebooks for dashboarding? | reddit.com/r/BusinessIntelligence | 2021-06-13
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.Project mention: [P] An introduction to PyKale https://github.com/pykale/pykale, a PyTorch library that provides a unified pipeline-based API for knowledge-aware multimodal learning and transfer learning on graphs, images, texts, and videos to accelerate interdisciplinary research. Welcome feedback/contribution! | reddit.com/r/MachineLearning | 2021-04-25
If you want a good example for reference, take a look at Pytorch Lightning's readme (https://github.com/PyTorchLightning/pytorch-lightning) It answers the 3 questions of "what is this", "why should I care", and "how do i use it" almost instantly
Topic Modelling for HumansProject mention: The Levenshtein Distance in Production | news.ycombinator.com | 2021-06-06
> Problem statement: the Levenshtein distance is a string metric for measuring the difference between two sequences
Another variant is "I have a bunch of words (a dictionary) and one query word, and want to find all words from the dictionary that are close to the query word".
This leads to an interesting class of problems, because you can do clever things where you precompute search structures (Levenshtein automata ) from the dictionary. The similarity queries then run (much) faster – in production, performance matters.
We recently merged a PR like that into Gensim .
This gave a ~1,500x speed-up compared to naively comparing all pairwise strings with Levenshtein distance. A difference between the training step running for years (=unusable) and minutes.
Best Practices on Recommendation SystemsProject mention: Opinion on choice of model - Recommender System | reddit.com/r/datascience | 2021-04-10
Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.
An open-source NLP research library, built on PyTorch.Project mention: C4 dataset released (800GB Common Crawl-derived text; T5 training data) | reddit.com/r/mlscaling | 2021-03-16
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 175 universities.Project mention: I created a way to learn machine learning through Jupyter | reddit.com/r/learnmachinelearning | 2021-04-30
There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book](https://github.com/d2l-ai/d2l-en) for example). However yours is more detail and could really helps beginners.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.Project mention: [D] Efficient ways of choosing number of layers/neurons in a neural network | reddit.com/r/statistics | 2021-04-20
optuna, hyperopt, nni, plenty of less-known tools too.
Deep learning library featuring a higher-level API for TensorFlow.Project mention: Base ball | dev.to | 2021-03-20
Both the teams in a game are given their individual ID values and are made into vectors. Relevant data like the home and away team, home runs, RBI’s, and walk’s are all taken into account and passed through layers. There’s no need to reinvent the wheel here, there's a multitude of libraries that enable a coder to implement machine learning theories efficiently. In this case we will be using a library called TFlearn, documentation available from http://tflearn.org. The program will output the home and away teams as well as their respective score predictions.
Statistical data visualization using matplotlibProject mention: [OC] Visualizing the impact of dice choice on outcome | reddit.com/r/DnD | 2021-05-30
https://seaborn.pydata.org/ It's a plot library, a bit more user friendly/pretty out of the box than raw matplotlib. sns is just an alias (import seaborn as sns).
🦉Data Version Control | Git for Data & Models | ML Experiments ManagementProject mention: [Project] DVC Studio – Git-Based ML Experiments Management | reddit.com/r/MachineLearning | 2021-06-02
Hey everyone, our team is working on open-source tools for data scientists: https://dvc.org and https://cml.dev. These two products help ML teams track ML experiments and run training in the cloud using Git & GitOps approach.
The easiest way to automate your dataProject mention: Hi, how can I do pipeline automation? | reddit.com/r/learnpython | 2021-04-18
If you are just starting out or new to doing automation, I would look at just python scripts executed with CRON if on Linux/Mac or Windows Task Scheduler if on Windows. But you'll need bash (Linux/Mac) knowledge or DOS/batch knowledge (Windows). Then graduate to using frameworks. Since you didnt specify what types of jobs you want to automate, for general purpose needs, I would look at a class of frameworks called task orchestration frameworks or workflow management libraries. I would highly recommend dagster as it comes with a native scheduler so you would be free from having to use CRON or Windows Task Scheduler. Other options include prefect, but if you want its other features like its scheduler and web GUI, you'll have to mess with docker. That's what's nice about dagster, it all works out of the box without need for non-Python dependencies.
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.Project mention: Are there any speech recognition modules so I can write one and do not have to rely on google and the likes? | reddit.com/r/learnmachinelearning | 2021-04-18
A next-generation curated knowledge sharing platform for data scientists and other technical professions.Project mention: How does everyone share their models etc. across teams for re-use effectively? | reddit.com/r/datascience | 2021-05-22
What are some of the best open-source Data Science projects in Python? This list will help you: