pymorphy2
tablib
Our great sponsors
pymorphy2 | tablib | |
---|---|---|
1 | 2 | |
1,098 | 4,524 | |
0.7% | 0.9% | |
0.0 | 7.0 | |
2 days ago | 14 days ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pymorphy2
tablib
-
Is this possible with Python?
other than Pandas, you can also use tablib. I personally find tablib to be slightly easier but it doesn't have as many features. But for what you need, tablib might be best
-
Fun with File Formats
There are two problems leading to the decision of only accepting public domain info: licensing and provenance.
"Licensing" is hard. The "Open Specifications Promise" [1], which covers a bunch of Microsoft-designed file formats, is merely a covenant not to sue.
"Provenance" is tricky. For example, much of the knowledge of the Apple iWork formats were derived by reverse-engineering the source programs and extracting protobuf definitions. Many open source projects have freely copied from each other, making detailed analysis tricky [2].
[1] https://en.wikipedia.org/wiki/Microsoft_Open_Specification_P...
What are some alternatives?
PyPDF2 - A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Kaitai Struct - Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
WeasyPrint - The awesome document factory
tika-docker - Convenience Docker images for Apache Tika Server
csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
feather - Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
file - Read-only mirror of file CVS repository, updated every half hour. NOTE: do not make pull requests here, nor comment any commits, submit them usual way to bug tracker or to the mailing list. Maintainer(s) are not tracking this git mirror.
ReportLab
DistorteD - Ruby multimedia toolkit with deep Jekyll integration 🧪
Python-Markdown - A Python implementation of John Gruber’s Markdown with Extension support.
fuzzywuzzy - Fuzzy String Matching in Python