|13 days ago||26 days ago|
|GNU General Public License v3.0 or later||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Hosted app to manage server inventory
2 projects | reddit.com/r/datacurator | 1 Dec 2022
Best FOSS (ideally Docker) that can split PDF files ?
4 projects | reddit.com/r/opensource | 29 Oct 2022
Document Parsing - an unsolved problem?
5 projects | reddit.com/r/LanguageTechnology | 19 Jul 2022
At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding.
Any SW recommendation to index any kind of file in a External Drive?
3 projects | reddit.com/r/datacurator | 1 May 2022
See https://tika.apache.org/ for more.
Hey y'all back again w/ the personal, self-hosted search engine
6 projects | reddit.com/r/selfhosted | 25 Apr 2022
For document content I've heard good things about Apache Tika. Spyglass could leverage it via the rest api.
Ask HN: How to extract information from mutiple (unstructured text) documents?
2 projects | news.ycombinator.com | 13 Apr 2022
- Apache Tika (https://tika.apache.org/)
KnowledgeCanvas-0.5.3: Grid view sorting, filtering, pagination, and customization!
2 projects | reddit.com/r/opensource | 15 Mar 2022
The short answer: Apache Tika!
Fun with File Formats
6 projects | news.ycombinator.com | 13 Dec 2021
Selfhosted File Management Solution? - tags, searching, etc
5 projects | reddit.com/r/selfhosted | 4 Dec 2021
I installed FileRun recently and that might get you close. It's fast and the search is pretty good as it can integrate Apache Tika, I like the OnlyOffice integration as well. It's closed-source, which isn't great for me, but you get 3 accounts without having to pay.
5 projects | reddit.com/r/Common_Lisp | 24 Nov 2021
Any native or FFI callable thing like the java tools such as Apache Tika? A quick duckduckgo search didn't turn up anything for me. Tika has served me well in the past, but I have no idea what I'd use with CL.
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
sist2 - Lightning-fast file system indexer and search tool
vaultwarden - Unofficial Bitwarden compatible server written in Rust, formerly known as bitwarden_rs
yew - Rust / Wasm framework for building client web apps
raspbian-nspawn-64 - Bootable RPi4 / RPi3 image with 64-bit kernel, 32-bit Raspbian Buster host OS, 64-bit Debian Buster guest OS in nspawn container
tauri - Build smaller, faster, and more secure desktop applications with a web frontend.
spyglass - A personal search engine, crawl & index websites/files you want with a simple set of rules
self-hosted_docker_setups - A collection of my docker-compose files used to setup self-hosted services on Raspberry Pi 4 running 64-bit Raspberry Pi OS
spacedrive - Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
server - self-hosted tag-based time tracking
tablib - Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
Nginx Proxy Manager - Docker container for managing Nginx proxy hosts with a simple, powerful interface