requests-html
awesome-web-scraping
Our great sponsors
requests-html | awesome-web-scraping | |
---|---|---|
2 | 6 | |
266 | 6,308 | |
- | - | |
0.0 | 5.1 | |
almost 2 years ago | 17 days ago | |
Makefile | ||
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
requests-html
-
Which string to lower case method to you use?
Example: requests-html which has a rather exhaustive README.md, but their dedicated page is not that helpful, if I remember correctly, and currently the domain is suspended.
-
Problem reaching a link hidden deeply in the html
You can get through this by using requests_html to render the full page before trying to reach this url (Selenium works too but is even heavier).
awesome-web-scraping
-
Ask HN: LinkedIn sent me a cease and desist for my Chrome extension. Help?
>I can scrape linkedin with a python script. That doesn't mean linkedin can shut down python.
Well said!
Also, what about copy-and-paste? The last time I checked, data could be highlighted in the browser, copied, and pasted...
Does that mean that LinkedIn can shut down the copy-and-paste capability of your browser and/or operating system?
What about "Save Page As..." functionality (the ability of a browser to save a page offline?)
Can LinkedIn shut down "Save Page As..." ?
Also, what about the Print Screen (take a screen snapshot) capabilities of your operating system?
Can LinkedIn shut down that?
Finally, there's literally oodles of software that can be used for web scraping; what follows below is just one non-canonical list:
https://github.com/lorien/awesome-web-scraping
Is LinkedIn going to shut down all of that, at the same time?
Anyway, an excellent point about Python!
- Awesome-web-scraping – List of libraries, tools and APIs for web scraping
-
How does webscraping a website work and putting the data into my website?
Because at least for the scraping part there are open-source and paid services that will probably get you the data today if you need it (unless these are some really hard-to-scrape websites you're targeting) But if you are keen on learning yourself just scroll down this subreddit you will find many guides users shared along the years...
-
Russian Flag in Readme
E.g. how would a Ukrainian dev feel having his project showcased in this list, under the Russian flag?
[0] https://github.com/lorien/awesome-web-scraping/issues/136
- A central repository for scrapping scripts
What are some alternatives?
requests-html - Pythonic HTML Parsing for Humans™
proxy-list - A list of free, public, forward proxy servers. UPDATED DAILY!
croncert-config - configuration and github actions for concertcloud.live (fka croncert.ch), a website that shows you concerts in various cities
Proxyman - Modern. Native. Delightful Web Debugging Proxy for macOS, iOS, and Android ⚡️
html2rss - 📰 Build RSS 2.0 feeds from websites (and JSON APIs) with a few CSS selectors.
awesome-micropython - A curated list of awesome MicroPython libraries, frameworks, software and resources.
TabNine - AI Code Completions
Awesome-Warez - All your base are belong to us!
syntax-highlighter - Syntax Highlighter extension for Visual Studio Code (VSCode). Based on Tree-sitter.
cookiecutter-poetry-pypackage - Cookiecutter template for poetry managed python package
bookmarks - :bookmark: :star: Collection of public dev bookmarks, shared with :heart: from www.codever.dev
2captcha-java - Java library for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas.