|8 days ago||about 1 month ago|
|GNU General Public License v3.0 or later||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
12 projects | dev.to | 9 Jan 2022
Feedback Request: Utilities for web scraping
2 projects | reddit.com/r/Python | 30 Dec 2021
You can have a look to https://scrapy.org/
Top 13 Web scraping tools in 2022
1 project | reddit.com/r/u_digitally_rajat | 28 Dec 2021
Scrapy is another tool on our list of the best web scraping tools. Scrapy is a collaborative open-source framework for extracting data from websites. It is a web scraping library for Python programmers who want to create scalable web crawlers.
How is ArchiveBox?
4 projects | reddit.com/r/selfhosted | 27 Dec 2021
If you need more advanced recursive spider/crawling ability beyond --depth=1, check out Browsertrix, Photon, or Scrapy and pipe the outputted URLs into ArchiveBox.
Do research web crawler programs exist?
1 project | reddit.com/r/AskProgramming | 17 Dec 2021
I would use Scrapy if I were to write this, in case you want to take a crack at it. Selenium is another option I would consider using. UIPath is a nocode option if you want to go that route.
Old guy programmer here, need to brush up on Python quickly!
13 projects | reddit.com/r/Python | 6 Dec 2021
scrapy for reading and processing data on websites
Web scraping data with pagination?
1 project | reddit.com/r/learnpython | 2 Dec 2021
What you want to use instead, is a web crawling framework like scrapy, which provides methods and classes to deal with all the common web scraping requirements. It has functions for pagination, it supports callbacks for using different parsers for different sub sites, it provides link extractors to find and follow urls, asynchronous request handling, logging, automatic request throttling, file exports for your results and many more. In short, it was written to provide all the tools you need to make writing web scrapers as comfortable and easy as possible.
What are the best 5 Web Scrapping API/Tool to scrape data?
1 project | reddit.com/r/u_ScrapperExpert | 1 Dec 2021
- Scrapfly (Probably the best web scraping api on the market) - If you liked ScraperAPI you should test them - you will see the difference - Scrapy (Web Scraping Framework from Zyte) - Browserless (Automation)
5 ways to keep your skills fresh after finishing a coding bootcamp
5 projects | dev.to | 28 Nov 2021
One way to improve your projects and coding skills is to try new models and libraries. For example, if you did classification with logistic regression, try also with random forest; if you used Tensorflow, now try Keras; if you scraped a website with BeautifulSoup, now do it with Scrapy. You get the point.
Good way to create a web scraper for multiple different sites
1 project | reddit.com/r/webdev | 18 Nov 2021
How to make all https traffic in program go through a specific proxy?
1 project | reddit.com/r/learnpython | 24 Dec 2021
Requests_html not working?
1 project | reddit.com/r/learnpython | 7 Nov 2021
Quite possible. If you look at requests-html source code, it is simply one single python file that acts as a wrapper around a bunch of other packages, like requests, chromium, parse, lxml, etc., plus a couple convenience functions. So it could easily be some sort of bad dependency resolution.
Web Scraping in a professional setting: Selenium vs. BeautifulSoup
2 projects | reddit.com/r/Python | 26 Oct 2021
What I do is try to see if I can use requests_html first before trying selenium. requests_html is usually enough if I dont need to interact with browser widgets or if the authentication isnt too difficult to reverse engineer.
Requests html: Directly downloading pyppeteer chrome, not by script run
1 project | reddit.com/r/learnpython | 18 Aug 2021
This issue is asking for the same thing. Seems like they've implemented a simple fix in this Pull Request. But it looks like it never made it to the Master branch. Maybe you can extend the class and make necessary changes if you know what you're doing, otherwise you're out of luck.
The best Python libraries
11 projects | reddit.com/r/Python | 19 May 2021
I'm not sure what is left to do, it is essentially a lightweight wrapper that consolidates a bunch of other libraries (like parse, requests, chromium, etc). The whole package is basically one file requests_html.py.
Read greyed element in HTML while scraping
1 project | reddit.com/r/learnpython | 28 Mar 2021
Alternatively, requests-html may be able to take the place of both, as it supports rendering HTML after executing JS.
Which one do you prefer in web scraping? BeautifulSoup or LXML?
1 project | reddit.com/r/learnpython | 9 Jan 2021
Hands down requests-html
What are some alternatives?
pyspider - A Powerful Spider(Web Crawler) System in Python.
MechanicalSoup - A Python library for automating interaction with websites.
Grab - Web Scraping Framework
portia - Visual scraping for Scrapy
feedparser - Parse feeds in Python
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)
colly - Elegant Scraper and Crawler Framework for Golang
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more