w3lib
Python library of web-related functions (by scrapy)
url
Python bindings to the Rust url crate (by crate-py)
w3lib | url | |
---|---|---|
1 | 1 | |
384 | 4 | |
0.8% | - | |
6.4 | 8.4 | |
3 days ago | 7 days ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
w3lib
Posts with mentions or reviews of w3lib.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-03-16.
-
Parsing URLs in Python
A great initiative!
We need a better URL parser in Scrapy, for similar reasons. Speed and WHATWG standard compliance (i.e. do the same as web browsers) are the main things.
It's possible to get closer to WHATWG behavior by using urllib and some hacks. This is what https://github.com/scrapy/w3lib does, which Scrapy currently uses. But it's still not quite compliant.
Also, surprisingly, on some crawls URL parsing can take CPU amounts similar to HTML parsing.
Ada / can_ada look very promising!
url
Posts with mentions or reviews of url.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-03-16.
-
Parsing URLs in Python
Nice.
I'll also throw in that I've recently wrote bindings to Mozilla's servo URL library.
Those live at https://github.com/crate-py/url
They're not complete yet (meaning only the parsing bits are exposed, not URL modification) but I too was frustrated with the state of URL parsing.