Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 22 Python Web Crawling Projects
-
Project mention: Scrapy: A Fast and Powerful Scraping and Web Crawling Framework | news.ycombinator.com | 2024-02-16
-
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
-
MechanicalSoup is a Python library for web scraping that combines the simplicity of Requests with the convenience of BeautifulSoup. It's particularly useful for interacting with web forms, like login pages. Here's a basic example to illustrate how you can use MechanicalSoup for web scraping:
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
-
Project mention: RSS can be used to distribute all sorts of information | news.ycombinator.com | 2023-11-20
There is JSON Feed¹ already. One of the spec writers is behind micro.blog, which is the first place I saw it(and also one of the few places I've seen it). I don't think it is a bad idea, and it doesn't take all that long to implement it.
I have long hoped it would pick up with the JSON-ify everything crowd, just so I'd never see a non-Atom feed again. We perhaps wouldn't need sooo much of the magic that is wrapped up in packages like feedparser² to deal with all the brokeness of RSS in the wild then.
-
-
-
-
In this blog post we'll cover on how to make direct requests to serpapi.com/search.json without using SerpApi's google-search-results Python client.
-
-
-
-
-
-
-
FastImage
Python library that finds the size / type of an image given its URI by fetching as little as needed (by bmuller)
-
-
Mariner
This a is mirror of Gitlab repository. Open your issues and pull requests there. (by radek-sprta)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Web Crawling related posts
- How to scrape a website with Python (Beginner tutorial)
- Scrapy: A Fast and Powerful Scraping and Web Crawling Framework
- Seven Python Projects to Elevate Your Coding Skills
- What is SERP? Meaning, Use Cases and Approaches
- Help! trying to use scraping for my dissertation but I am clueless
- Turning webpages into pdf
- Implementing case sensitive headers in Scrapy (not through `_caseMappings`)
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024
Index
What are some of the best open-source Web Crawling projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Scrapy | 50,763 |
2 | pyspider | 16,310 |
3 | requests-html | 13,574 |
4 | portia | 9,159 |
5 | MechanicalSoup | 4,545 |
6 | RoboBrowser | 3,689 |
7 | Grab | 2,353 |
8 | gain | 2,031 |
9 | feedparser | 1,823 |
10 | PSpider | 1,811 |
11 | cola | 1,485 |
12 | Sukhoi | 878 |
13 | google-search-results-python | 514 |
14 | MSpider | 345 |
15 | spidy Web Crawler | 322 |
16 | Crawley | 182 |
17 | brownant | 157 |
18 | Demiurge | 110 |
19 | Pomp | 60 |
20 | FastImage | 28 |
21 | microwler | 13 |
22 | Mariner | 2 |