Top 23 Python Data Science Projects
Deep Learning for humansProject mention: [D] Batch normalization before or after activation function | reddit.com/r/MachineLearning | 2021-02-23
scikit-learn: machine learning in PythonProject mention: Using TinyML to identify farts | dev.to | 2021-02-22
The model in question is trained using Scikit-Learn, a Python Machine Learning library. The audio data is loaded into numpy arrays, then split into training and testing data, the model is trained using the training data, then tested with the testing data to give an idea on the accuracy.
Get performance insights in less than 4 minutes. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
Apache Superset is a Data Visualization and Data Exploration PlatformProject mention: Publishing dashboards for clients (advice and suggestions plz) | reddit.com/r/BusinessIntelligence | 2021-02-23
Many people use Apache Superset this way, in the 'embedded' way: superset.apache.org Since its open source, you can customize it extensively.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.Project mention: Resources for learning Python from scratch specifically for data ingestion | reddit.com/r/learnpython | 2021-02-13
data science ipython notebooks
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: Ask HN: What is your production ML stack like? (2021) | news.ycombinator.com | 2021-02-08
Here's the ML stack I have been using for my last project:
- Doing NLP with spaCy (https://spacy.io/) as I consider it to be the most production ready framework for NLP
- Annotating datasets with Prodigy (https://prodi.gy/), a paid tool made by the spaCy team
- Deploying the trained spaCy models onto NLP Cloud (https://nlpcloud.io)
- Use the models through the NLP Cloud API in production and enrich my Django application out of it
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.Project mention: How to get my multi-agents more collaborative? | reddit.com/r/reinforcementlearning | 2021-02-15
QMIX is indeed a great paper. I'm planning on using it with RLLIB on my env, however it asks some work to adapt and understand the subtleties ;) ( such as the agents groups : https://github.com/ray-project/ray/blob/936cb5929c455102d5638ff5d59c80c4ae94770f/rllib/env/multi_agent_env.py#L82 )
Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.Project mention: Question About Embedding Html Audio Tags In | reddit.com/r/IPython | 2021-02-17
I've duplicated your error, and it appears to only happen with .wav files. It seems to be a Firefox issue.
If you want a web based dashboard then dash is the way to go
Streamlit — The fastest way to build data apps in PythonProject mention: Which GUI framework do you/would you use for which purposes and why? | reddit.com/r/Python | 2021-02-13
streamlit (Oriented Data science)
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.Project mention: DDP with model parallelism with multi host multi GPU system | reddit.com/r/pytorch | 2021-02-07
Topic Modelling for HumansProject mention: Koan: A word2vec negative sampling implementation with correct CBOW update | news.ycombinator.com | 2021-01-02
Apparently it did: https://github.com/RaRe-Technologies/gensim/issues/1873
An open-source NLP research library, built on PyTorch.Project mention: AllenNLP v2.0.0 | news.ycombinator.com | 2021-01-27
Deep learning library featuring a higher-level API for TensorFlow.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.Project mention: How we were able to achieve hyper-parameter tuning (HPT) for deep learning workflows at 1.5x faster in our clusters and 3x cheaper on AWS | reddit.com/r/learnmachinelearning | 2021-02-23
To tackle the problem of long and expensive HPT workflows, our team at Petuum collaborated with Microsoft to integrate AdaptDL with Neural Network Intelligence (NNI). AdaptDL is an open-source tool in the CASL (Composable, Automatic, and Scalable Learning) ecosystem. AdaptDL offers adaptive resource management for distributed clusters, and reduces the cost of deep learning workloads ranging from a few training/tuning trials to thousands. NNI from the Microsoft open-source community, is a toolkit for automatic machine learning (AutoML) and hyper-parameter tuning.
Statistical data visualization using matplotlib
🦉Data Version Control | Git for Data & ModelsProject mention: SnowFS – a fast, scalable version control file storage for graphic files | news.ycombinator.com | 2021-02-20
Very interesting. I'd like to learn more about how it works. How does this compare to DVC, for instance?
I'll throw in a shameless plug for my tool in this area, Dud. Dud is to DVC what Flask is to Django.
Are the mentioned benchmarks published somewhere?
The easiest way to automate your dataProject mention: [D] Software stack to replicate Azure ML / Google Auto ML on premise | reddit.com/r/MachineLearning | 2021-02-03
Update: So far I started using Prefect (http://prefect.io). With this I can work on my local computer, submit code to Azure Blob Storage and the Prefect server. After which a agent (worker) runs the code. Logging/Metrics are not implemented yet, I might use MLFlow for this (http://mlflow.org). Furthermore, there is still a dependency on a cloud solution to store your Flows (programs) to run them on agents.
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)Project mention: PyOD: ~50 anomaly detection algorithms in one framework. | reddit.com/r/algotrading | 2021-01-25
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.Project mention: best-of-python: A ranked list of awesome Python libraries and tools | reddit.com/r/Python | 2021-01-14
Here ya go: https://github.com/ml-tooling/best-of-ml-python/pull/47
Build and manage real-life data science projects with ease.Project mention: Netflix's Metaflow: Reproducible machine learning pipelines | news.ycombinator.com | 2020-12-21
has anyone done a comparison of ML pipelines from a devops centric perspective ?
For example, Metaflow doesnt support kubernetes today - https://github.com/Netflix/metaflow/issues/16
so ultimately the scale up story in most of these management tools is iffy.
I previously asked about kubeflow here - https://news.ycombinator.com/item?id=24808090 . Seems people think its pretty "horrendous". It seems most of these tools assume a very specialised devops team who will work around the ml tool...rather than the ml tool making this easy.
Always know what to expect from your data.Project mention: For those using Airflow for your ELT/Orchestration, How are you perfroming your EL? | reddit.com/r/dataengineering | 2021-01-30
(T) : https://github.com/fishtown-analytics/dbt + https://github.com/great-expectations/great_expectations + https://github.com/dagster-io/dagster
What are some of the best open-source Data Science projects in Python? This list will help you: