incubator-gluten
PyMuPDF
incubator-gluten | PyMuPDF | |
---|---|---|
3 | 5 | |
1,002 | 4,103 | |
3.0% | 5.3% | |
9.9 | 9.8 | |
about 4 hours ago | 8 days ago | |
Scala | Python | |
Apache License 2.0 | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
incubator-gluten
-
A glimpse into the future of data processing infrastructure.
When I first learned about the Gluten project from Intel, I thought Databricks was going to be in trouble.
- FLaNK Stack for 04 December 2023
-
Blaze: Fast query execution engine for Apache Spark
Interesting, looks like it is just DataFusion engine for Spark. There is a similar project: https://github.com/oap-project/gluten - it brings ClickHouse as an engine to Spark.
PyMuPDF
- FLaNK Stack for 04 December 2023
-
Converting markdown to pdf in Python
This method is based on the use of the libraries markdown-it-py (conversion from markdown to html) and [PyMuPDF] https://github.com/pymupdf/PyMuPDF) (conversion from html to pdf). A small Python class links them together.
-
Show HN: I am building a new Python library to read/write PDF files
I think you might mean PyMuPDF (https://github.com/pymupdf/PyMuPDF), a Python library built on top of the MuPDF C library (https://mupdf.com/).
PyMuPDF and MuPDF are both available under dual open source AGPL and commercial licenses. They have been around for many years and are under continual development.
[Disclaimer, i work for Artifex, who wrote MuPDF and recently acquired PyMuPDF.]
- M1 Mac: myuPDF install (wheel?)
- legacy install error: PyMuPDF?
What are some alternatives?
LearningSparkV2 - This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
PyPDF2 - A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
opaque-sql - An encrypted data analytics platform
ReportLab
blaze - Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
blaze - NumPy and Pandas interface to Big Data
borb - borb is a library for reading, creating and manipulating PDF files in python.
Jupyter Scala - A Scala kernel for Jupyter
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
kyuubi - Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
pdfquery - A fast and friendly PDF scraping library.