|about 7 hours ago||3 days ago|
|BSD 3-clause "New" or "Revised" License||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Data Science toolset summary from 2021
13 projects | dev.to | 13 Nov 2021
Scikit-learn - It is one of the most widely used frameworks for Python based Data science tasks. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Link - https://scikit-learn.org/
Intel Extension for Scikit-Learn
4 projects | news.ycombinator.com | 1 Nov 2021
Currently some works is being done to improve computational primitives of scikit-learn to enhance its overhaul performances natively.
You can have a look at this exploratory PR: https://github.com/scikit-learn/scikit-learn/pull/20254
This other PR is a clear revamp of this previous one:
Scikit-Learn Version 1.0
11 projects | news.ycombinator.com | 14 Sep 2021
Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2
Top 10 Python Libraries for Machine Learning
14 projects | dev.to | 9 Sep 2021
Website: https://scikit-learn.org/ Github Repository: https://github.com/scikit-learn/scikit-learn Developed By: SkLearn.org Primary Purpose: Predictive Data Analysis and Data Modeling
where is binary_metric function in sklearn package
1 project | reddit.com/r/learnmachinelearning | 20 Aug 2021
There is a function named binary_metric in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/metrics/_base.py
Use Scikit-Learn and Runflow
2 projects | dev.to | 6 Jul 2021
If you're not familiar with Scikit-learn and Runflow,
Confused as to what exaclty a piece of code does
1 project | reddit.com/r/learnmachinelearning | 18 Jun 2021
well you can start at https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/model_selection/_validation.py, or maybe someone will guide you later
What Makes Python Libraries So Important For Data Science Learning?
3 projects | reddit.com/r/u_Snoo36930 | 16 Jun 2021
Next comes the complexity of drawing the maximum possible number of valuable insights. Using different python libraries such as Scikit-Learn, PyTorch, Pandas, etc., complications of data analysis can be solved within a minute. And the complexity associated with visualisation gets handled by other data visualisation libraries like Matploitlib, PyTorch, etc.
Is there a way to map cluster centers back to a dataframe?
1 project | reddit.com/r/learnpython | 19 May 2021
To avoid the issue with convergence (and the discrepancy between the labels_ and cluster_centers_), you can set tol=0, though this can of course lead to issues if convergence is a problem. There was an issue about it here. Assuming it's converged, then the order is fine.
Any from scratch Hamming Loss implementations?
1 project | reddit.com/r/LearnML | 10 May 2021
The source code for the function you refer to is quite straightforward anyway. The definition of count_nonzero() is here.
Good way to create a web scraper for multiple different sites
1 project | reddit.com/r/webdev | 18 Nov 2021
working on the parser of a website - in order to get this running on x pages
1 project | reddit.com/r/learnpython | 18 Nov 2021
Use scrapy to do this.
[OC] Which programming language is required to land a data job at Meta (Facebook)
3 projects | reddit.com/r/dataisbeautiful | 17 Nov 2021
Download Files with Scrapy Crawl Spider - Tutorial and Source Code
2 projects | reddit.com/r/webscraping | 29 Oct 2021
I assume that you have at least working knowledge of Python though. This tutorial also assumes that you have at the very least, have played around with Scrapy.
Wanting to build a web scraper with no prior coding knowledge. Where do I start as fast as possible?
1 project | reddit.com/r/webscraping | 23 Oct 2021
Check out https://scrapy.org/ it’s a Python framework for web scraping and then look at https://youtube.com/c/JohnWatsonRooney channel to learn the syntax. Finally, go to https://www.zyte.com/scrapy-cloud/ to deploy your crawler to the cloud!
Why on earth is the Modpack button on curseforge always swapped so much? Are the devs having an argument or something?
1 project | reddit.com/r/feedthebeast | 30 Sep 2021
5 meilleurs outils pour scraper Google Maps
2 projects | reddit.com/r/u_octoparsefr | 29 Sep 2021
Self teaching how to code:
2 projects | reddit.com/r/legaltech | 24 Sep 2021
For full production-ready web scraping with Python, https://scrapy.org/ is the thing. Not for beginners, though, IMO.
How to Crawl the Web with Scrapy
7 projects | news.ycombinator.com | 13 Sep 2021
While I agree that Scrapy is a great tool for beginner tutorials and easy entry into scraping, it's becoming difficult to use it in real world scenarios because almost all the large players now employ some anti-bot or anti-scraping protection.
A great example above all is Cloudflare. You simply can't convince Cloudflare you're a human with Scrapy alone. Scrapy has only experimental support of HTTP2 and does not support proxies over HTTP2 (https://github.com/scrapy/scrapy/issues/5213). Yet, all browsers use HTTP2 now, which means all normal users use HTTP2... You get the point.
What we use now is Got Scraping (https://github.com/apify/got-scraping). It's a special purpose extension of Got (HTTP client with 18 mil weekly downloads) that masks its HTTP communication as if it was coming from a real browser. Of course, this will not get you as far as Puppeteer or Playwright (headless browsers), but it improved our scraping tremendously. If you need a full crawling library, see the Apify SDK (https://sdk.apify.com) which uses Got Scraping under the hood.
Building a Random Movie Picker Using Python and Selenium
1 project | dev.to | 12 Sep 2021
A framework such as Scrapy would provide the data quicker, but a big reason I'm building this is to help Dany learn Python.
What are some alternatives?
requests-html - Pythonic HTML Parsing for Humans™
Keras - Deep Learning for humans
pyspider - A Powerful Spider(Web Crawler) System in Python.
Surprise - A Python scikit for building and analyzing recommender systems
MechanicalSoup - A Python library for automating interaction with websites.
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
tensorflow - An Open Source Machine Learning Framework for Everyone
gensim - Topic Modelling for Humans
Grab - Web Scraping Framework
portia - Visual scraping for Scrapy