Our great sponsors
- ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
- Sonar - Write Clean Python Code. Always.
- CodiumAI - TestGPT | Generating meaningful tests for busy devs
- InfluxDB - Access the most powerful time series database as a service
|13 days ago||about 2 months ago|
|MIT License||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
We haven't tracked posts mentioning pymorphy2 yet.
Tracking mentions began in Dec 2020.
Fun with File Formats
6 projects | news.ycombinator.com | 13 Dec 2021
There are two problems leading to the decision of only accepting public domain info: licensing and provenance.
"Licensing" is hard. The "Open Specifications Promise" , which covers a bunch of Microsoft-designed file formats, is merely a covenant not to sue.
"Provenance" is tricky. For example, much of the knowledge of the Apple iWork formats were derived by reverse-engineering the source programs and extracting protobuf definitions. Many open source projects have freely copied from each other, making detailed analysis tricky .
What are some alternatives?
PyPDF2 - A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
WeasyPrint - The awesome document factory
csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
Python-Markdown - A Python implementation of John Gruber’s Markdown with Extension support.
pdftabextract - A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
vcspull - :arrows_counterclockwise: Synchronize projects via yaml/json manifest. Built using `libvcs`.
mistletoe - A fast, extensible and spec-compliant Markdown parser in pure Python.
Mistune - A fast yet powerful Python Markdown parser with renderers and plugins.