ada
w3lib
ada | w3lib | |
---|---|---|
6 | 1 | |
1,213 | 382 | |
11.5% | 0.3% | |
9.2 | 6.7 | |
5 days ago | 20 days ago | |
C++ | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ada
-
Parsing URLs in Python
...
can_ada is just the python bindings, largely generated via pybind11.
The actual project is at https://github.com/ada-url/ada
- Whatwg-compliant and fast URL parser written in modern C++
-
ARM vs. Intel on Amazon’s Cloud: A URL Parsing Benchmark
When I see the word "benchmark" and don't see a methodology I get a little wary.
In this case the author ran a custom benchmark from one of their projects. https://github.com/ada-url/ada/blob/main/benchmarks/wpt_benc...
To be clear I'm not questioning the benchmark's accuracy or author's bona fides, but that post was a little short for my taste.
- Benchmarking Ada url parser with Servo URL
-
Node.js is moving to a new, faster URL parser called Ada written in modern c++
For those who don't want to go to twitter first https://github.com/ada-url/ada/releases/tag/v1.0.0
- Ada: Fast WHATWG-compliant URL parser
w3lib
-
Parsing URLs in Python
A great initiative!
We need a better URL parser in Scrapy, for similar reasons. Speed and WHATWG standard compliance (i.e. do the same as web browsers) are the main things.
It's possible to get closer to WHATWG behavior by using urllib and some hacks. This is what https://github.com/scrapy/w3lib does, which Scrapy currently uses. But it's still not quite compliant.
Also, surprisingly, on some crawls URL parsing can take CPU amounts similar to HTML parsing.
Ada / can_ada look very promising!
What are some alternatives?
swift - The Swift Programming Language
tomlplusplus - Header-only TOML config file parser and serializer for C++17.
OpenTimelineIO - Open Source API and interchange format for editorial timeline information.
Lyra - A simple to use, composable, command line parser for C++ 11 and beyond
newrelic-php-agent - The New Relic PHP Agent
Diagon - Interactive ASCII art diagram generators. :star2: