Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The most well-known one (and actually the only one I know) is scrapy. I won't go into too much detail, but among others it offers:
lxml is an XML parser however, it also supports HTML parsing. It's blazing fast and supports XPath. I think it isn't as beginner friendly to use, though it has detailed documentation. It works less well with heavily broken HTML documents and the encoding detection isn't as good as the one of BS4.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
- Scrapy: A Fast and Powerful Scraping and Web Crawling Framework
- Implementing case sensitive headers in Scrapy (not through `_caseMappings`)
- Dicas para projetos usando web scraping
- Best tools to use for web scraping ??
- I'm using python to scrape web page content and extract keywords, how can I make it faster to process?