Parsing URLs in Python

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

can_ada

2 122 6.9 C++

Python bindings for Ada, a fast and spec-compliant URL parser.

I apologize for the misjudgment. I just followed the link to can_ada and saw really minimal tests, e.g. https://github.com/TkTech/can_ada/blob/main/tests/test_parsi...
I didn't understand that can_ada is not where the parser is developed.

furl

1 2,574 0.0 Python

🌐 URL parsing and manipulation made easy.
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
yarl

2 1,229 9.4 Python

Yet another URL library
universal_pathlib

1 181 7.8 Python

pathlib api extended to use fsspec backends

You might be interested in https://github.com/fsspec/universal_pathlib

ada

6 1,194 9.2 C++

WHATWG-compliant and fast URL parser written in modern C++

...
can_ada is just the python bindings, largely generated via pybind11.
The actual project is at https://github.com/ada-url/ada

w3lib

1 381 6.7 Python

Python library of web-related functions

A great initiative!
We need a better URL parser in Scrapy, for similar reasons. Speed and WHATWG standard compliance (i.e. do the same as web browsers) are the main things.
It's possible to get closer to WHATWG behavior by using urllib and some hacks. This is what https://github.com/scrapy/w3lib does, which Scrapy currently uses. But it's still not quite compliant.
Also, surprisingly, on some crawls URL parsing can take CPU amounts similar to HTML parsing.
Ada / can_ada look very promising!

url

1 4 8.4 Python

Python bindings to the Rust url crate (by crate-py)

Nice.
I'll also throw in that I've recently wrote bindings to Mozilla's servo URL library.
Those live at https://github.com/crate-py/url
They're not complete yet (meaning only the parsing bits are exposed, not URL modification) but I too was frustrated with the state of URL parsing.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
rust-url

2 1,225 7.5 Rust

URL parser for Rust

IMO that URL crate is not especially high quality. I barely work with URLs and I quickly found an embarrassingly trivial bug:
https://github.com/servo/rust-url/issues/864#issuecomment-16...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project