tablib
tika-docker
Our great sponsors
tablib | tika-docker | |
---|---|---|
2 | 20 | |
4,509 | 98 | |
1.3% | - | |
7.0 | 5.3 | |
6 days ago | 5 months ago | |
Python | Shell | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tablib
-
Fun with File Formats
There are two problems leading to the decision of only accepting public domain info: licensing and provenance.
"Licensing" is hard. The "Open Specifications Promise" [1], which covers a bunch of Microsoft-designed file formats, is merely a covenant not to sue.
"Provenance" is tricky. For example, much of the knowledge of the Apple iWork formats were derived by reverse-engineering the source programs and extracting protobuf definitions. Many open source projects have freely copied from each other, making detailed analysis tricky [2].
[1] https://en.wikipedia.org/wiki/Microsoft_Open_Specification_P...
tika-docker
-
40 Containers & Counting...
https://tika.apache.org Meta data from things.
- Hosted app to manage server inventory
- Best FOSS (ideally Docker) that can split PDF files ?
-
Document Parsing - an unsolved problem?
At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding.
-
Any SW recommendation to index any kind of file in a External Drive?
See https://tika.apache.org/ for more.
-
Hey y'all back again w/ the personal, self-hosted search engine
For document content I've heard good things about Apache Tika. Spyglass could leverage it via the rest api.
-
Ask HN: How to extract information from mutiple (unstructured text) documents?
- Apache Tika (https://tika.apache.org/)
-
KnowledgeCanvas-0.5.3: Grid view sorting, filtering, pagination, and customization!
The short answer: Apache Tika!
- Fun with File Formats
-
Selfhosted File Management Solution? - tags, searching, etc
I installed FileRun recently and that might get you close. It's fast and the search is pretty good as it can integrate Apache Tika, I like the OnlyOffice integration as well. It's closed-source, which isn't great for me, but you get 3 accounts without having to pay.
What are some alternatives?
pymorphy2 - Morphological analyzer / inflection engine for Russian and Ukrainian languages.
Kaitai Struct - Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
sist2 - Lightning-fast file system indexer and search tool
spyglass - A personal search engine: Create a searchable library from your personal documents, interests, and more!
yew - Rust / Wasm framework for creating reliable and efficient web applications
spacedrive - Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
self-hosted_docker_setups - A collection of my docker-compose files used to setup self-hosted services on Raspberry Pi 4 running 64-bit Raspberry Pi OS
Nginx Proxy Manager - Docker container for managing Nginx proxy hosts with a simple, powerful interface
server - self-hosted tag-based time tracking
vaultwarden - Unofficial Bitwarden compatible server written in Rust, formerly known as bitwarden_rs
tauri - Build smaller, faster, and more secure desktop applications with a web frontend.