Python HTML Manipulation

Open-source Python projects categorized as HTML Manipulation | Edit details

Top 10 Python HTML Manipulation Projects

  • GitHub repo xmltodict

    Python module that makes working with XML feel like you are working with JSON

    Project mention: Like JQ, but for HTML | news.ycombinator.com | 2021-09-07

    xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.

    Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).

    This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.

  • GitHub repo bleach

    Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

    Project mention: mutation XSS via allowed math or svg; p or br; and style, title, noscript, script, textarea, noframes, iframe, | reddit.com/r/websecurityresearch | 2021-02-05
  • Activeloop.ai

    Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.

  • GitHub repo pyquery

    A jquery-like library for python

  • GitHub repo lxml

    The lxml XML toolkit for Python

    Project mention: How do i go about building a vidoe conferencing app? | reddit.com/r/rust | 2021-08-20

    Generally, I'm already using Python to glue together things like OpenCV or libxml, which do the heavy-lifting, and taking advantage of how things like Qt's QImage release Python's Global Interpreter Lock, allowing me to load and process images on a background thread, so the Python code itself is usually already I/O-bound, but yes. If the Python code would become a bottleneck, it helps with that too.

  • GitHub repo xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

  • GitHub repo html5lib

    Standards-compliant library for parsing and serializing HTML documents and fragments in Python

    Project mention: Why are circular dependencies even a thing? | reddit.com/r/linuxquestions | 2021-09-25

    Easier example...sphinx is a document generator for python programs (creating docs for the API of programs from source-code comments for example). Spinx depends on html5lib which itself again depends on six...want to make a guess what six uses to generate its API docs? ;) So if you want the api docs of six you will have to first install it without to be able to get a working sphinx install then redo the six on including the building of the API docs.

  • GitHub repo gazpacho

    🥫 The simple, fast, and modern web scraping library

    Project mention: Ask HN: What are some tools / libraries you built yourself? | news.ycombinator.com | 2021-05-16

    I've been working on gazpacho [1] for last two years.

    It's a general purpose web scraping library for Python that replaces BeautifulSoup + requests for most projects.

    Just surpassed ~2K downloads every week!

    [1] https://github.com/maxhumber/gazpacho

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo untangle

    Converts XML to Python objects

  • GitHub repo MarkupSafe

    Safely add untrusted strings to HTML/XML markup.

  • GitHub repo xmldataset

    xmldataset: xml parsing made easy 🗃️

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-09-25.

Index

What are some of the best open-source HTML Manipulation projects in Python? This list will help you:

Project Stars
1 xmltodict 4,558
2 bleach 2,231
3 pyquery 2,045
4 lxml 1,955
5 xhtml2pdf 1,835
6 html5lib 935
7 gazpacho 599
8 untangle 527
9 MarkupSafe 427
10 xmldataset 71
Find remote jobs at our new job board 99remotejobs.com. There are 36 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.