|1 day ago||6 days ago|
|Apache License 2.0||OSI Approved|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Top Github repo trends in 2021
47 projects | dev.to | 12 Jan 2022
No surprises here: deep learning is the most popular subcategory, with hugging face transformers repo, YOLOv5, Tensorflow and Deepmind’s Alphafold all in the mix. Surprisingly, the only proper infrastructure-ey repos on the list are Meilisearch and Clickhouse, a tad bit surprising given all the hype data infrastructure receives in VC-world, but again, probably just a question of size of end-user populations + whether data scientists spend tons of time on Github vs. Web Developers…
5% of 666 Python repos had comma typo bugs (inc V8, TensorFlow and PyTorch)
20 projects | news.ycombinator.com | 7 Jan 2022
Also all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.
Also in terms of mistakes codereviewdoctor twice linked to the same issue in their blog https://github.com/tensorflow/tensorflow/issues/53636 and raised the PR to the wrong project https://github.com/tensorflow/tensorflow/pull/53637 (I guess Tensorflow vendors Keras, easy mistake)20 projects | news.ycombinator.com | 7 Jan 2022
How we found and helped fix 24
bugs in 24 hours (in Tensorflow, Sentry, V8, PyTorch, Hue, and more)
6 projects | dev.to | 5 Jan 2022
Political systems can be designed to allow more to happen with less political capital being wasted
1 project | reddit.com/r/PoliticalPhilosophy | 3 Jan 2022
1 project | reddit.com/r/machinelearningmemes | 30 Dec 2021
Source in github
Someone had a lot to say in a TensorFlow GitHub issue
1 project | news.ycombinator.com | 29 Dec 2021
On the official tensorflow repo. They closed it without a fix.
2 projects | reddit.com/r/ProgrammerHumor | 28 Dec 2021
Edit (link): https://github.com/tensorflow/tensorflow/issues/535492 projects | reddit.com/r/ProgrammerHumor | 28 Dec 2021
An honest guy...
1 project | reddit.com/r/ProgrammerHumor | 27 Dec 2021
scikit-learn test case results?
1 project | reddit.com/r/scikit_learn | 5 Jan 2022
How do you reduce information leakage and bias when going from descriptive analytics to prescriptive analytics?
1 project | reddit.com/r/datascience | 30 Dec 2021
I'd say, the first question you'd need to ask yourself is "Why do I want to do statistical tests" and "what kind of statistical tests do I want to do?". Most of them rely on a bunch of assumptions and just winging it will produce a number that will be reported and used but is terribly wrong. Funnily enough, scikit-learn does not directly give you p-values for this very reason and advise you to run the same regression in statsmodels.
Learning python, what next?
1 project | reddit.com/r/LearnToCode | 29 Dec 2021
Machine learning and statistical analysis? http://scikit-learn.org
Identifying trolls and bots on Reddit with machine learning (Part 2) - Identificando trolls y bots en reddit con Machine Learning
5 projects | reddit.com/r/Republica_Argentina | 17 Dec 2021
Our next step is to create a new machine learning model based on this list. We’ll use Python’s excellent scikit learn framework to build our model. We’ll store our training data into two data frames: one for the set of features to train in and the second with the desired class labels. We’ll then split our dataset into 70% training data and 30% test data.
Will I be able to switch into a hardware job if my first job is in data science?
1 project | reddit.com/r/ElectricalEngineering | 7 Dec 2021
I can't tell you whether you'd like data science or machine learning, but I can tell you I took a class in it last year. It was an applied ML class targeting power systems engineers. ML is extremely statistics and probability heavy. I personally found the theory to be very dry, but the application to be rather enjoyable. We used sci-kit learn, which is an interesting Python package targeting academic data science and machine learning. https://scikit-learn.org/
Old guy programmer here, need to brush up on Python quickly!
13 projects | reddit.com/r/Python | 6 Dec 2021
scikit-learn for classical machine learning,
Data Science toolset summary from 2021
13 projects | dev.to | 13 Nov 2021
Scikit-learn - It is one of the most widely used frameworks for Python based Data science tasks. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Link - https://scikit-learn.org/
Intel Extension for Scikit-Learn
4 projects | news.ycombinator.com | 1 Nov 2021
Currently some works is being done to improve computational primitives of scikit-learn to enhance its overhaul performances natively.
You can have a look at this exploratory PR: https://github.com/scikit-learn/scikit-learn/pull/20254
This other PR is a clear revamp of this previous one:
Scikit-Learn Version 1.0
11 projects | news.ycombinator.com | 14 Sep 2021
Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2
Top 10 Python Libraries for Machine Learning
14 projects | dev.to | 9 Sep 2021
Website: https://scikit-learn.org/ Github Repository: https://github.com/scikit-learn/scikit-learn Developed By: SkLearn.org Primary Purpose: Predictive Data Analysis and Data Modeling
What are some alternatives?
PaddlePaddle - PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Keras - Deep Learning for humans
Surprise - A Python scikit for building and analyzing recommender systems
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
gensim - Topic Modelling for Humans
LightFM - A Python implementation of LightFM, a hybrid recommendation algorithm.
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration