feather
tika-docker
Our great sponsors
feather | tika-docker | |
---|---|---|
3 | 14 | |
2,633 | 66 | |
- | - | |
0.0 | 8.3 | |
about 1 year ago | 21 days ago | |
JavaScript | Shell | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
feather
- Fun with File Formats
-
Vineyard: An open-source in-memory data manager
It'd be interesting to know how this compares with alternative solutions.
I might not understand the benefit proposition correctly, and I'm not specifically into Python for data work, but I immediately thought of things like feather[1], fst[2], disk.frame[3] and even DuckDB[4].
Some of these are on disk rather than in memory, but I'd still be interested in performance and use case comparisons.
[1] https://github.com/wesm/feather
tika-docker
- Hosted app to manage server inventory
- Best FOSS (ideally Docker) that can split PDF files ?
-
Document Parsing - an unsolved problem?
At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding.
-
Any SW recommendation to index any kind of file in a External Drive?
See https://tika.apache.org/ for more.
-
Hey y'all back again w/ the personal, self-hosted search engine
For document content I've heard good things about Apache Tika. Spyglass could leverage it via the rest api.
-
Ask HN: How to extract information from mutiple (unstructured text) documents?
- Apache Tika (https://tika.apache.org/)
-
KnowledgeCanvas-0.5.3: Grid view sorting, filtering, pagination, and customization!
The short answer: Apache Tika!
- Fun with File Formats
-
Selfhosted File Management Solution? - tags, searching, etc
I installed FileRun recently and that might get you close. It's fast and the search is pretty good as it can integrate Apache Tika, I like the OnlyOffice integration as well. It's closed-source, which isn't great for me, but you get 3 accounts without having to pay.
-
Encoding detection
Any native or FFI callable thing like the java tools such as Apache Tika? A quick duckduckgo search didn't turn up anything for me. Tika has served me well in the past, but I have no idea what I'd use with CL.
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
sist2 - Lightning-fast file system indexer and search tool
yew - Rust / Wasm framework for building client web apps
vaultwarden - Unofficial Bitwarden compatible server written in Rust, formerly known as bitwarden_rs
spyglass - A personal search engine, crawl & index websites/files you want with a simple set of rules
tauri - Build smaller, faster, and more secure desktop applications with a web frontend.
raspbian-nspawn-64 - Bootable RPi4 / RPi3 image with 64-bit kernel, 32-bit Raspbian Buster host OS, 64-bit Debian Buster guest OS in nspawn container
self-hosted_docker_setups - A collection of my docker-compose files used to setup self-hosted services on Raspberry Pi 4 running 64-bit Raspberry Pi OS
server - self-hosted tag-based time tracking
spacedrive - Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
tablib - Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
libvineyard - vineyard (v6d): an in-memory immutable data manager. [Moved to: https://github.com/alibaba/v6d]