requests-html
Pythonic HTML Parsing for Humans™ (by psf)
feedparser
Parse feeds in Python (by kurtmckee)
requests-html | feedparser | |
---|---|---|
14 | 7 | |
13,806 | 2,096 | |
0.2% | 1.9% | |
0.0 | 7.6 | |
about 1 year ago | 23 days ago | |
Python | Python | |
MIT License | BSD 2-clause "Simplified" License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
requests-html
Posts with mentions or reviews of requests-html.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-02-13.
- will requests-html library work as selenium
-
8 Most Popular Python HTML Web Scraping Packages with Benchmarks
requests-html
-
How to batch scrape Wall Street Journal (WSJ)'s Financial Ratios Data?
Ya, thanks for advice. When using requests_html library, I am trying to lower down the speed using response.html.render(timeout=1000), but it raise Runtime error instead on Google Colab: https://github.com/psf/requests-html/issues/517.
- Note, the first time you ever run the render() method, it will download Chromium into your home directory (e.g. ~/.pyppeteer/). This only happens once.
-
Data scraping tools
For dynamic js, prefer requests-html with xpath selection.
-
Which string to lower case method to you use?
Example: requests-html which has a rather exhaustive README.md, but their dedicated page is not that helpful, if I remember correctly, and currently the domain is suspended.
-
Top python libraries/ frameworks that you suggest every one
When it comes to web scraping, the usual people recommend is beautifulsoup, lxml, or selenium. But I highly recommend people check out requests-html also. Its a library that is a happy medium between ease of use as in beautifulsoup and also good enough to be used for dynamic, javascript data where it would be overkill to use a browser emulator like selenium.
- How to make all https traffic in program go through a specific proxy?
-
Requests_html not working?
Quite possible. If you look at requests-html source code, it is simply one single python file that acts as a wrapper around a bunch of other packages, like requests, chromium, parse, lxml, etc., plus a couple convenience functions. So it could easily be some sort of bad dependency resolution.
-
Web Scraping in a professional setting: Selenium vs. BeautifulSoup
What I do is try to see if I can use requests_html first before trying selenium. requests_html is usually enough if I dont need to interact with browser widgets or if the authentication isnt too difficult to reverse engineer.
feedparser
Posts with mentions or reviews of feedparser.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-11-12.
-
What I Wish Someone Told Me About Postgres
i am using the feedparser library in python https://github.com/kurtmckee/feedparser/ which basically takes an RSS url and standardizes it to a reasonable extent. But I have noticed that different websites still get parsed slightly differently. For example look at how https://beincrypto.com/feed/ has a long description (containing actual HTML) inside but this website https://www.coindesk.com/arc/outboundfeeds/rss/ completely cuts the description out. I have about 50 such websites and they all have slight variations. So you are saying that in addition to storing parsed data (title, summary, content, author, pubdate, link, guid) that I currently store, I should also add an xml column and store the raw from each url till I get a good hang of how each site differs?
-
RSS can be used to distribute all sorts of information
There is JSON Feed¹ already. One of the spec writers is behind micro.blog, which is the first place I saw it(and also one of the few places I've seen it). I don't think it is a bad idea, and it doesn't take all that long to implement it.
I have long hoped it would pick up with the JSON-ify everything crowd, just so I'd never see a non-Atom feed again. We perhaps wouldn't need sooo much of the magic that is wrapped up in packages like feedparser² to deal with all the brokeness of RSS in the wild then.
¹ https://www.jsonfeed.org/
² https://github.com/kurtmckee/feedparser
-
Help! trying to use scraping for my dissertation but I am clueless
What sites did you try? Looked into RSS yet? Many sites have RSS feeds you can use with something like https://github.com/kurtmckee/feedparser nytimes.com feeds: https://www.nytimes.com/rss
-
Newb learning GitHub & Python. Projects?
feedparser
-
Python Library to scrape RSS-Feeds from waybackmachine?
You can explore FeedParser too
-
looking for a project
feedparser is a python package receiving and parsing RSS/Atom newsfeeds. The maintainer is active but really need much more support.
-
Consulta de un Novato absoluto
Lo más sencillo que conozco para monitorizar canales de YouTube son los feeds RSS que tiene cada canal. El formato es https://www.youtube.com/feeds/videos.xml?channel_id=[CHANNEL_ID]. Si no conoces RSS, echa un vistazo en la wiki. Para leer RSSs en Python tienes feedparser (y seguramente muchas más).
What are some alternatives?
When comparing requests-html and feedparser you can also consider the following projects:
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
pyspider - A Powerful Spider(Web Crawler) System in Python.
MechanicalSoup - A Python library for automating interaction with websites.
RoboBrowser