Python HTML Manipulation

Open-source Python projects categorized as HTML Manipulation

Top 10 Python HTML Manipulation Projects

  • xmltodict

    Python module that makes working with XML feel like you are working with JSON

    Project mention: XML to CSV or JSON using Cloud Function | reddit.com/r/googlecloud | 2022-12-14

    Your Cloud Function would be written in Node.js, Python, Go, Java, C#, Ruby, or PHP; pick the one you're most comfortable with. It would get the name and bucket of the newly uploaded XML file as an input parameter. It would then load the file and call a library that makes the conversion. Example libraries: xml-js (for Node), xmltodict (for Python).

  • bleach

    Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

    Project mention: I wrote a markdown to html converter | reddit.com/r/golang | 2023-02-01

    I don't know a golang library for it but https://github.com/mozilla/bleach is a python lib that escapes all the nasty javascript inputs.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • lxml

    The lxml XML toolkit for Python

    Project mention: 8 Most Popular Python HTML Web Scraping Packages with Benchmarks | dev.to | 2023-02-01

    lxml

  • pyquery

    A jquery-like library for python

  • xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    Project mention: What is an example of a fully finished python software product on github? | reddit.com/r/learnprogramming | 2022-07-04

    Judging by your previous comments, you might get what you're looking for from xhtml2pdf. (Don't be put off by the 0.2.7 version number; xhtml2pdf has been in active development for over a decade and is a stable library.) It's generally used as a library in Python projects, but it does have a stand-alone command-line interface, and the library in general is thoroughly documented so you don't need to rely just on reading the code to figure out what's going on.

  • html5lib

    Standards-compliant library for parsing and serializing HTML documents and fragments in Python

    Project mention: Bleach 6.0.0 Release and Deprecation | news.ycombinator.com | 2023-01-27

    Yes. This is really interesting.

    Sounds like html5lib has been asking for funding, but doesn't look like there's much progress. https://github.com/html5lib/html5lib-python/issues/361

  • gazpacho

    🥫 The simple, fast, and modern web scraping library

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • untangle

    Converts XML to Python objects

  • MarkupSafe

    Safely add untrusted strings to HTML/XML markup.

    Project mention: Check50 not working due to an import error. | reddit.com/r/cs50 | 2022-03-10

    After a quick search I found that there may have been a breaking change in that package (https://github.com/pallets/markupsafe/issues/284), but I haven't ready about other people doing CS50 getting the same error.

  • xmldataset

    xmldataset: xml parsing made easy 🗃️

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-02-01.

Python HTML Manipulation related posts

Index

What are some of the best open-source HTML Manipulation projects in Python? This list will help you:

Project Stars
1 xmltodict 5,032
2 bleach 2,475
3 lxml 2,274
4 pyquery 2,176
5 xhtml2pdf 1,965
6 html5lib 1,009
7 gazpacho 694
8 untangle 569
9 MarkupSafe 520
10 xmldataset 75
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com