gazpacho
html5lib
Our great sponsors
gazpacho | html5lib | |
---|---|---|
1 | 3 | |
730 | 1,095 | |
- | 0.9% | |
3.2 | 4.1 | |
5 months ago | about 2 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
gazpacho
-
Ask HN: What are some tools / libraries you built yourself?
I've been working on gazpacho [1] for last two years.
It's a general purpose web scraping library for Python that replaces BeautifulSoup + requests for most projects.
Just surpassed ~2K downloads every week!
[1] https://github.com/maxhumber/gazpacho
html5lib
-
Bleach 6.0.0 Release and Deprecation
Yes. This is really interesting.
Sounds like html5lib has been asking for funding, but doesn't look like there's much progress. https://github.com/html5lib/html5lib-python/issues/361
-
Pydantic Factories
Neither did html5lib.
-
Why are circular dependencies even a thing?
Easier example...sphinx is a document generator for python programs (creating docs for the API of programs from source-code comments for example). Spinx depends on html5lib which itself again depends on six...want to make a guess what six uses to generate its API docs? ;) So if you want the api docs of six you will have to first install it without to be able to get a working sphinx install then redo the six on including the building of the API docs.
What are some alternatives?
selectolax - Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
lxml - The lxml XML toolkit for Python
xhtml2pdf - A library for converting HTML into PDFs using ReportLab
xmltodict - Python module that makes working with XML feel like you are working with JSON
bleach - Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
untangle - Converts XML to Python objects
pyquery - A jquery-like library for python
cssutils
xmldataset - xmldataset: xml parsing made easy 🗃️