Top 23 Python Data Science Projects
scikit-learn: machine learning in PythonProject mention: Scikit-Learn Version 1.0 | news.ycombinator.com | 2021-09-14
Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2
Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.
Apache Superset is a Data Visualization and Data Exploration PlatformProject mention: Does anyone have experience with live dashboards? | reddit.com/r/datascience | 2021-10-15
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.Project mention: Beginner in Python for Data Science | reddit.com/r/learnpython | 2020-12-27
data science ipython notebooks
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: I put together a tutorial and overview on how to use DeepSpeech to do Speech Recognition in Python | reddit.com/r/Python | 2021-10-14
It definitely could - with the real-time speech recognition example shown in the tutorial. But you'd likely need some sort of NLU running after the transcription is performed - to basically parse what was spoken into a command that you can use to run some business logic. There are some good open source libs for this too like https://spacy.io/
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.Project mention: How to deploy a rllib-trained model? | reddit.com/r/reinforcementlearning | 2021-10-16
Streamlit — The fastest way to build data apps in PythonProject mention: Sharing results with other people | reddit.com/r/JupyterNotebooks | 2021-10-11
This might be a good place to use something like streamlit (https://streamlit.io/). Turn you notebook into a script, have them upload the file (or have them enter a network path). Run your code over the file, and return the file. Streamlit can run on your local machine, or on some other local machine, or streamlit can host it
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.Project mention: [P] An introduction to PyKale https://github.com/pykale/pykale, a PyTorch library that provides a unified pipeline-based API for knowledge-aware multimodal learning and transfer learning on graphs, images, texts, and videos to accelerate interdisciplinary research. Welcome feedback/contribution! | reddit.com/r/MachineLearning | 2021-04-25
If you want a good example for reference, take a look at Pytorch Lightning's readme (https://github.com/PyTorchLightning/pytorch-lightning) It answers the 3 questions of "what is this", "why should I care", and "how do i use it" almost instantly
Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.Project mention: A resource for looking 'under the hood' of multi-processing in python | reddit.com/r/learnpython | 2021-04-19
I stand corrected... Apparently, it does matter, if you are on Windows... https://github.com/ipython/ipython/issues/4698#issuecomment-30605308
Topic Modelling for HumansProject mention: The unthinking application of this regex-efficiency check wasted our attention | news.ycombinator.com | 2021-09-30
Best Practices on Recommendation SystemsProject mention: Opinion on choice of model - Recommender System | reddit.com/r/datascience | 2021-04-10
Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.Project mention: I created a way to learn machine learning through Jupyter | reddit.com/r/learnmachinelearning | 2021-04-30
There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book](https://github.com/d2l-ai/d2l-en) for example). However yours is more detail and could really helps beginners.
An open-source NLP research library, built on PyTorch.Project mention: Any allennlp users in this sub? | reddit.com/r/LanguageTechnology | 2021-10-08
https://github.com/allenai/allennlp/discussions looks active
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.Project mention: Automated Machine Learning (AutoML) - 9 Different Ways with Microsoft AI | dev.to | 2021-10-04
For a complete tutorial, navigate to this Jupyter Notebook: https://github.com/microsoft/nni/blob/master/examples/notebooks/tabular_data_classification_in_AML.ipynb
Deep learning library featuring a higher-level API for TensorFlow.Project mention: Base ball | dev.to | 2021-03-20
Both the teams in a game are given their individual ID values and are made into vectors. Relevant data like the home and away team, home runs, RBI’s, and walk’s are all taken into account and passed through layers. There’s no need to reinvent the wheel here, there's a multitude of libraries that enable a coder to implement machine learning theories efficiently. In this case we will be using a library called TFlearn, documentation available from http://tflearn.org. The program will output the home and away teams as well as their respective score predictions.
Statistical data visualization in PythonProject mention: Series Intro: Data Visualization With Svelte and D3 | dev.to | 2021-10-08
Curran Kelleher's Data Visualization With React and D3 is probably the most comprehensive course on D3 on the Internet. For some time I have been curious about D3 and data visualization in general. While I have worked with Python libraries like Matplotlib , Seaborn and Plotly and they are fantastic for most kinds of exploratory and scientifc plots they have limitations on the web particularly in terms of interactivity. D3, for the uinitiated is the library which powers the bulk of all the interactive and SVG based plots on the web. It also acts as underlying low-level library for a lot of higher level libraries including Vega(For a more comprehensive list look here ).
🦉Data Version Control | Git for Data & Models | ML Experiments ManagementProject mention: [D] How do you ensure reproducibility? | reddit.com/r/MachineLearning | 2021-09-24
You'll want to add some reproducibility at the data layer, and several libraries exist, such as dvc (https://github.com/iterative/dvc, https://dvc.org/).
The easiest way to automate your dataProject mention: My first Hacktoberfest !!✨ | dev.to | 2021-10-15
View on GitHub
A collection of machine learning examples and tutorials.Project mention: How to save an attention model for deployment/exposing to an API? | reddit.com/r/deeplearning | 2021-08-17
I've been following a course teaching how to make an attention model for neural machine translation, This is the file inside the repo. I know that I'll have to use certain functions to make the textual input be processed in encodings and tokens, but those functions use certain instances of the model, which I don't know if I should keep or not. If anyone can please take a look and help me out here, it'd be really really appreciated.
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.Project mention: Awesome list of ML | reddit.com/r/programming | 2021-09-16
A curated list of data science blogsProject mention: ⚙️ Data Science Blogs: A vast collection of #blogs about #DataScience. h/t @Sauain | reddit.com/r/policerewired | 2021-09-13
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
What are some of the best open-source Data Science projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.