tika-docker
all-in-one
tika-docker | all-in-one | |
---|---|---|
20 | 192 | |
103 | 4,142 | |
4.9% | 6.2% | |
4.1 | 9.9 | |
about 1 month ago | 2 days ago | |
Shell | PHP | |
Apache License 2.0 | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tika-docker
- Text Extraction from Documents
- Apache Tika – Extract text and metadata from doc types (the backbone of RAG)
-
Demystifying Text Data with the Unstructured Python Library
If you accept running Java, the Apache Tika is extremely good at parsing content (https://tika.apache.org/)
- Ajuda com Buscador
-
How do you manage and find large amount of files?
Apache Tika can spit out text from lots of formats. I've used it with grep (or rg) to make a small scale searching of local folders. Tika does a really good job at OCR for finding if text is in a file.
-
40 Containers & Counting...
https://tika.apache.org Meta data from things.
- Hosted app to manage server inventory
- Best FOSS (ideally Docker) that can split PDF files ?
- OK, ElasticSearch works, text files are indexed. How about images? Can images be indexed in NextCloud and fulltextsearched?
-
Document Parsing - an unsolved problem?
At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding.
all-in-one
-
15 open-source tools to elevate your software design workflow
Link | Demo | Github | License
-
Nextcloud install
If you aren't super technical please please go for the all-in-one. The manual docker image is super complicated for an unexperienced user. I'm super well at home in Linux and command lines and I wouldn't even CONSIDER doing it the manual way. The AIO is hard enough, and orders of magnitude simpler. Don't even think about docker compose or all that stuff - go to https://github.com/nextcloud/all-in-one/ and follow that...
-
Local-only instance and ACME challenge
Newbie to NC here, hosting at home. Reading through the local-only guide and have a couple of questions, if you don't mind:
-
Nextcloud AIO Behind NGINX Proxy Manager
I followed https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
- PfSense HaProxy sample config
-
NextCloud Docker
Or this one? https://github.com/nextcloud/all-in-one
-
NC AIO - Local instance - OpenWrt
I can't figure out the steps to get things running locally. (https://github.com/nextcloud/all-in-one/blob/main/local-instance.md) I'm trying the docker AIO for Windows, and I think I'm on step 2. I've pasted the steps from the link over.. I hope for some guidance here :)
-
More AIO image troubles
version: "3.8" services: nextcloud: image: nextcloud/all-in-one:latest restart: always container_name: nextcloud-aio-mastercontainer # This line is not allowed to be changed as otherwise AIO will not work correctly volumes: - nextcloud_aio_mastercontainer:/mnt/docker-aio-config # This line is not allowed to be changed as otherwise the built-in backup solution will not work - /var/run/docker.sock:/var/run/docker.sock:ro # May be changed on macOS, Windows or docker rootless. See the applicable documentation. If adjusting, don't forget to also set 'WATCHTOWER_DOCKER_SOCKET_PATH'! ports: #- 38983:80 # Can be removed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md - 38984:8080 #- 38985:8443 # Can be removed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md environment: # Is needed when using any of the options below - APACHE_PORT=11000 # Is needed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md - APACHE_IP_BINDING=0.0.0.0 # Should be set when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else) that is running on the same host. See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md - NEXTCLOUD_DATADIR=/storage/nextcloud/data # Allows to set the host directory for Nextcloud's datadir. ⚠️⚠️⚠️ Warning: do not set or adjust this value after the initial Nextcloud installation is done! See https://github.com/nextcloud/all-in-one#how-to-change-the-default-location-of-nextclouds-datadir - WATCHTOWER_DOCKER_SOCKET_PATH=/var/run/docker.sock # Needs to be specified if the docker socket on the host is not located in the default '/var/run/docker.sock'. Otherwise mastercontainer updates will fail. For macos it needs to be '/var/run/docker.sock' - SKIP_DOMAIN_VALIDATION=true volumes: nextcloud_aio_mastercontainer: name: nextcloud_aio_mastercontainer # This line is not allowed to be changed as otherwise the built-in backup solution will not work
-
NextCloud AIO in Portainer on OpenMediaVault - Installation Issues
services:nextcloud:image: nextcloud/all-in-one:latestrestart: alwayscontainer_name: nextcloud-aio-mastercontainer # This line is not allowed to be changed as otherwise AIO will not work correctlyvolumes:- nextcloud_aio_mastercontainer:/mnt/docker-aio-config # This line is not allowed to be changed as otherwise the built-in backup solution will not work- /var/run/docker.sock:/var/run/docker.sock:ro # May be changed on macOS, Windows or docker rootless. See the applicable documentation. If adjusting, don't forget to also set 'WATCHTOWER_DOCKER_SOCKET_PATH'!ports:- 81:81 # Can be removed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md- 8080:8080- 8443:8443 # Can be removed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.mdenvironment: # Is needed when using any of the options below# - AIO_DISABLE_BACKUP_SECTION=false # Setting this to true allows to hide the backup section in the AIO interface. See https://github.com/nextcloud/all-in-one#how-to-disable-the-backup-section- APACHE_PORT=11000 # Is needed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md- APACHE_IP_BINDING=0.0.0.0 # Should be set when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else) that is running on the same host. See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md# - BORG_RETENTION_POLICY=--keep-within=7d --keep-weekly=4 --keep-monthly=6 # Allows to adjust borgs retention policy. See https://github.com/nextcloud/all-in-one#how-to-adjust-borgs-retention-policy# - COLLABORA_SECCOMP_DISABLED=false # Setting this to true allows to disable Collabora's Seccomp feature. See https://github.com/nextcloud/all-in-one#how-to-disable-collaboras-seccomp-feature# - NEXTCLOUD_DATADIR=/mnt/ncdata # Allows to set the host directory for Nextcloud's datadir. ⚠️⚠️⚠️ Warning: do not set or adjust this value after the initial Nextcloud installation is done! See https://github.com/nextcloud/all-in-one#how-to-change-the-default-location-of-nextclouds-datadir# - NEXTCLOUD_MOUNT=/mnt/ # Allows the Nextcloud container to access the chosen directory on the host. See https://github.com/nextcloud/all-in-one#how-to-allow-the-nextcloud-container-to-access-directories-on-the-host- NEXTCLOUD_UPLOAD_LIMIT=500G # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-upload-limit-for-nextcloud- NEXTCLOUD_MAX_TIME=10800 # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-max-execution-time-for-nextcloud- NEXTCLOUD_MEMORY_LIMIT=1536M # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-php-memory-limit-for-nextcloud# - NEXTCLOUD_TRUSTED_CACERTS_DIR=/path/to/my/cacerts # CA certificates in this directory will be trusted by the OS of the nexcloud container (Useful e.g. for LDAPS) See See https://github.com/nextcloud/all-in-one#how-to-trust-user-defined-certification-authorities-ca# - NEXTCLOUD_STARTUP_APPS=deck twofactor_totp tasks calendar contacts notes # Allows to modify the Nextcloud apps that are installed on starting AIO the first time. See https://github.com/nextcloud/all-in-one#how-to-change-the-nextcloud-apps-that-are-installed-on-the-first-startup# - NEXTCLOUD_ADDITIONAL_APKS=imagemagick # This allows to add additional packages to the Nextcloud container permanently. Default is imagemagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-os-packages-permanently-to-the-nextcloud-container# - NEXTCLOUD_ADDITIONAL_PHP_EXTENSIONS=imagick # This allows to add additional php extensions to the Nextcloud container permanently. Default is imagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-php-extensions-permanently-to-the-nextcloud-container# - NEXTCLOUD_ENABLE_DRI_DEVICE=true # This allows to enable the /dev/dri device in the Nextcloud container. ⚠️⚠️⚠️ Warning: this only works if the '/dev/dri' device is present on the host! If it should not exist on your host, don't set this to true as otherwise the Nextcloud container will fail to start! See https://github.com/nextcloud/all-in-one#how-to-enable-hardware-transcoding-for-nextcloud# - TALK_PORT=3478 # This allows to adjust the port that the talk container is using. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-talk-port# - WATCHTOWER_DOCKER_SOCKET_PATH=/var/run/docker.sock # Needs to be specified if the docker socket on the host is not located in the default '/var/run/docker.sock'. Otherwise mastercontainer updates will fail. For macos it needs to be '/var/run/docker.sock'- SKIP_DOMAIN_VALIDATION=true# networks: # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file# - nextcloud-aio # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file# # Optional: Caddy reverse proxy. See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md# # You can find further examples here: https://github.com/nextcloud/all-in-one/discussions/588# caddy:# image: caddy:alpine# restart: always# container_name: caddy# volumes:# - ./Caddyfile:/etc/caddy/Caddyfile# - ./certs:/certs# - ./config:/config# - ./data:/data# - ./sites:/srv# network_mode: "host"volumes:nextcloud_aio_mastercontainer:name: nextcloud_aio_mastercontainer # This line is not allowed to be changed as otherwise the built-in backup solution will not work# # Optional: If you need ipv6, follow step 1 and 2 of https://github.com/nextcloud/all-in-one/blob/main/docker-ipv6-support.md first and then uncomment the below config in order to activate ipv6 for the internal nextcloud-aio network.# # Please make sure to uncomment also the networking lines of the mastercontainer above in order to actually create the network with docker-compose# networks:# nextcloud-aio:# name: nextcloud-aio # This line is not allowed to be changed as otherwise the created network will not be used by the other containers of AIO# driver: bridge# enable_ipv6: true# ipam:# driver: default# config:# - subnet: fd12:3456:789a:2::/64 # IPv6 subnet to use
-
Help with local server setup
It is basically all images and services pre-configured to host a single Nextcloud instance. Check this page https://github.com/nextcloud/all-in-one
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
docker - ⛴ Docker image of Nextcloud
sist2 - Lightning-fast file system indexer and search tool
NextCloudPi - 📦 Build code for NextcloudPi: Raspberry Pi, Odroid, Rock64, curl installer...
spyglass - A personal search engine: Create a searchable library from your personal documents, interests, and more!
portainer_templates - Portainer Version 2 Template and Self-Hosting Cookbook. A Series of Tools, Tutorials/Instructions, and Links to help you create your very own Self-Hosting System and Lab Sandbox!
yew - Rust / Wasm framework for creating reliable and efficient web applications
nextcloud-snap - ☁️📦 Nextcloud packaged as a snap
spacedrive - Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
docker-swag - Nginx webserver and reverse proxy with php support and a built-in Certbot (Let's Encrypt) client. It also contains fail2ban for intrusion prevention.
self-hosted_docker_setups - A collection of my docker-compose files used to setup self-hosted services on Raspberry Pi 4 running 64-bit Raspberry Pi OS
Nextcloud - ☁️ Nextcloud server, a safe home for all your data