tika-docker
Healthchecks
tika-docker | Healthchecks | |
---|---|---|
20 | 208 | |
103 | 7,322 | |
4.9% | 1.6% | |
4.1 | 9.7 | |
about 1 month ago | 4 days ago | |
Shell | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tika-docker
- Text Extraction from Documents
- Apache Tika – Extract text and metadata from doc types (the backbone of RAG)
-
Demystifying Text Data with the Unstructured Python Library
If you accept running Java, the Apache Tika is extremely good at parsing content (https://tika.apache.org/)
- Ajuda com Buscador
-
How do you manage and find large amount of files?
Apache Tika can spit out text from lots of formats. I've used it with grep (or rg) to make a small scale searching of local folders. Tika does a really good job at OCR for finding if text is in a file.
-
40 Containers & Counting...
https://tika.apache.org Meta data from things.
- Hosted app to manage server inventory
- Best FOSS (ideally Docker) that can split PDF files ?
- OK, ElasticSearch works, text files are indexed. How about images? Can images be indexed in NextCloud and fulltextsearched?
-
Document Parsing - an unsolved problem?
At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding.
Healthchecks
-
Show HN: I built a self-hosted status page and monitoring tool for my projects
Hey mate, I'm using https://healthchecks.io/ for heartbeat monitoring my crons. It's been working flawlessly for quite some time now. The UI is super clean and easy to navigate. It's also free up to 20 monitored jobs. Note - I'm not in any way related to that project.
-
Webhooks suck, but here are alternatives
In fact, your platform (https://healthchecks.io/) is a prime example of where running customer wasm would be really excellent.
Instead of sending webhooks out to customer configured URLs, you could run a Wasm environment to execute customer code. Off hand, a good use case here is to do further inspection of the event before it gets sent off to some other system - maybe there are cases where you send false-positives and needlessly trigger external system alerts. The customer Wasm could do more introspection on the healthcheck event and make a more informed decision about how to proceed.
-
What do you use for external monitoring?
i use healthchecks.io and have been very happy
-
Show HN: OnlineOrNot – Cron Job Monitoring
Is there anything different from https://healthchecks.io/ --- a service I've been using for free for a couple years now?
-
Prioritize IPv4 over IPv6 in dual stack
Because of this block on the router, and the fact that IPv6 connections are by default preferred over IPv4, many things on the system now cannot access the internet. the only things that can access the internet are for accessing servers that ONLY support IPv4 like my mail.smpt2go or my uptime monitoring scripts for healthchecks.io.
- Ask HN: How do you monitor your systemd services?
- Show HN: Peeng – like Pingdom, but the other way around and simpler
-
Detecting and alerting for power failures
i use https://healthchecks.io/ and highly recommend it.
-
Managing re-occurring tasks - Daily/weekly/monthly
We use a heartbeat system. Basically the monitoring continuously sends an alert to a healtcheck system. If that heartbeat fails, PagerDuty sends an alert to the oncall.
-
Uptime site monitor - notification solutions for home while sleeping
i like healthchecks.io
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
uptime-kuma - A fancy self-hosted monitoring tool
sist2 - Lightning-fast file system indexer and search tool
cadvisor - Analyzes resource usage and performance characteristics of running containers.
spyglass - A personal search engine: Create a searchable library from your personal documents, interests, and more!
gatus - ⛑ Automated developer-oriented status page
yew - Rust / Wasm framework for creating reliable and efficient web applications
Netdata - The open-source observability platform everyone needs
spacedrive - Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
Sentry - Developer-first error tracking and performance monitoring
self-hosted_docker_setups - A collection of my docker-compose files used to setup self-hosted services on Raspberry Pi 4 running 64-bit Raspberry Pi OS
borgmatic - Simple, configuration-driven backup software for servers and workstations