|4 days ago||6 months ago|
|BSD 3-clause "New" or "Revised" License||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scikit-learn test case results?
1 project | reddit.com/r/scikit_learn | 5 Jan 2022
How do you reduce information leakage and bias when going from descriptive analytics to prescriptive analytics?
1 project | reddit.com/r/datascience | 30 Dec 2021
I'd say, the first question you'd need to ask yourself is "Why do I want to do statistical tests" and "what kind of statistical tests do I want to do?". Most of them rely on a bunch of assumptions and just winging it will produce a number that will be reported and used but is terribly wrong. Funnily enough, scikit-learn does not directly give you p-values for this very reason and advise you to run the same regression in statsmodels.
Learning python, what next?
1 project | reddit.com/r/LearnToCode | 29 Dec 2021
Machine learning and statistical analysis? http://scikit-learn.org
Identifying trolls and bots on Reddit with machine learning (Part 2) - Identificando trolls y bots en reddit con Machine Learning
5 projects | reddit.com/r/Republica_Argentina | 17 Dec 2021
Our next step is to create a new machine learning model based on this list. We’ll use Python’s excellent scikit learn framework to build our model. We’ll store our training data into two data frames: one for the set of features to train in and the second with the desired class labels. We’ll then split our dataset into 70% training data and 30% test data.
Will I be able to switch into a hardware job if my first job is in data science?
1 project | reddit.com/r/ElectricalEngineering | 7 Dec 2021
I can't tell you whether you'd like data science or machine learning, but I can tell you I took a class in it last year. It was an applied ML class targeting power systems engineers. ML is extremely statistics and probability heavy. I personally found the theory to be very dry, but the application to be rather enjoyable. We used sci-kit learn, which is an interesting Python package targeting academic data science and machine learning. https://scikit-learn.org/
Old guy programmer here, need to brush up on Python quickly!
13 projects | reddit.com/r/Python | 6 Dec 2021
scikit-learn for classical machine learning,
Data Science toolset summary from 2021
13 projects | dev.to | 13 Nov 2021
Scikit-learn - It is one of the most widely used frameworks for Python based Data science tasks. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Link - https://scikit-learn.org/
Intel Extension for Scikit-Learn
4 projects | news.ycombinator.com | 1 Nov 2021
Currently some works is being done to improve computational primitives of scikit-learn to enhance its overhaul performances natively.
You can have a look at this exploratory PR: https://github.com/scikit-learn/scikit-learn/pull/20254
This other PR is a clear revamp of this previous one:
Scikit-Learn Version 1.0
11 projects | news.ycombinator.com | 14 Sep 2021
Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2
Top 10 Python Libraries for Machine Learning
14 projects | dev.to | 9 Sep 2021
Website: https://scikit-learn.org/ Github Repository: https://github.com/scikit-learn/scikit-learn Developed By: SkLearn.org Primary Purpose: Predictive Data Analysis and Data Modeling
Beginner questions about NER model evaluation.
1 project | reddit.com/r/LanguageTechnology | 12 Mar 2021
. The standard way to evaluate NER (or any other sequence labelling problem) is to use the conlleval script (https://www.clips.uantwerpen.be/conll2000/chunking/output.html) or through the seqeval package in python (https://github.com/chakki-works/seqeval) . Either way, you need a list of predicted labels and a list of gold labels (see the code example in the link, it should be trivial to converse your output to the same data format).
What are some alternatives?
Keras - Deep Learning for humans
Surprise - A Python scikit for building and analyzing recommender systems
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
tensorflow - An Open Source Machine Learning Framework for Everyone
gensim - Topic Modelling for Humans
TFLearn - Deep learning library featuring a higher-level API for TensorFlow.
MLflow - Open source platform for the machine learning lifecycle
SciKit-Learn Laboratory - SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow