bleach
selectolax
Our great sponsors
- CodiumAI - TestGPT | Generating meaningful tests for busy devs
- Sonar - Write Clean Python Code. Always.
- InfluxDB - Access the most powerful time series database as a service
- ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
bleach | selectolax | |
---|---|---|
5 | 5 | |
2,506 | 814 | |
0.5% | - | |
4.6 | 5.8 | |
4 months ago | 6 days ago | |
Python | Cython | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
bleach
-
I wrote a markdown to html converter
I don't know a golang library for it but https://github.com/mozilla/bleach is a python lib that escapes all the nasty javascript inputs.
-
Serialize Django Data for JavaScript
This is an excellent point; I should have addressed safety in my article. I'll point out that in my use case, I'm using `safe` on data I create and not any user-generated data.
You should never use `safe` on user data unless you use something like bleach (https://github.com/mozilla/bleach) to sanitize the data. Even then, you should use caution.
selectolax
-
8 Most Popular Python HTML Web Scraping Packages with Benchmarks
selectolax
-
The State of Web Scraping in 2021
Lazyweb link: https://github.com/rushter/selectolax
although I don't follow the need to have what appears to be two completely separate HTML parsing C libraries as dependencies; seeing this in the readme for Modest gives me the shivers because lxml has _seen some shit_
> Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
although its other dep seems much more cognizant about the HTML5 standard, for whatever that's worth: https://github.com/lexbor/lexbor#lexbor
---
> It looks like the author of the article just googled some libraries for each language and didn't research the topic
Heh, oh, new to the Internet, are you? :-D
What are some alternatives?
lxml - The lxml XML toolkit for Python
MarkupSafe - Safely add untrusted strings to HTML/XML markup.
xhtml2pdf - A library for converting HTML into PDFs using ReportLab
html5lib - Standards-compliant library for parsing and serializing HTML documents and fragments in Python
lexbor - Lexbor is development of an open source HTML Renderer library. https://lexbor.com
pyquery - A jquery-like library for python
gazpacho - 🥫 The simple, fast, and modern web scraping library
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)
xmltodict - Python module that makes working with XML feel like you are working with JSON
cssutils