w3lib
furl
w3lib | furl | |
---|---|---|
1 | 1 | |
382 | 2,574 | |
0.3% | - | |
6.7 | 0.0 | |
about 1 month ago | about 1 year ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | The Unlicense |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
w3lib
-
Parsing URLs in Python
A great initiative!
We need a better URL parser in Scrapy, for similar reasons. Speed and WHATWG standard compliance (i.e. do the same as web browsers) are the main things.
It's possible to get closer to WHATWG behavior by using urllib and some hacks. This is what https://github.com/scrapy/w3lib does, which Scrapy currently uses. But it's still not quite compliant.
Also, surprisingly, on some crawls URL parsing can take CPU amounts similar to HTML parsing.
Ada / can_ada look very promising!
furl
What are some alternatives?
purl - A simple, immutable URL class with a clean API for interrogation and manipulation.
short_url - Python implementation for generating Tiny URL- and bit.ly-like URLs.
webargs - A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.
yarl - Yet another URL library
pyshorteners - :electric_plug: Generating short urls with python has never been easier
courlan - Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters
url_cleaner - A package for removing tracing parameters from URLs. This package supports automatically updating filtering rules from Adguard.
OAuthLib - A generic, spec-compliant, thorough implementation of the OAuth request-signing logic
dottorrent - High-level Python 3 library for creating .torrent files
cleanurl - Remove clutter from URLs and return a canonicalized version
winapps - Python library for managing installed applications on Windows