DistorteD
tika-docker
Our great sponsors
DistorteD | tika-docker | |
---|---|---|
9 | 14 | |
13 | 66 | |
- | - | |
7.1 | 8.3 | |
17 days ago | 18 days ago | |
Ruby | Shell | |
GNU Affero General Public License v3.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DistorteD
-
News for Ruby 3.2.0
Here's one that sounds like exactly the sort of example you had in mind: https://github.com/okeeblow/DistorteD/tree/NEW%E2%80%85SENSA...
Disclaimer: mine :)
-
Ruby adds a new core Data class to represent immutable value objects
This is the use-case for me. Here's an actual example of a Struct I will probably convert to Data in the file-identification library I've been working on. Right now they just have their `#to_a` overridden to disable some of their annoying automatic Enumerable behavior: https://github.com/okeeblow/DistorteD/blob/dd2a99285072982d3...
-
Fun with File Formats
In addition to this resource and UK's equivalent (PRONOM/DROID, also mentioned in the linked post), I've found ArchiveTeam's wiki to be very useful for obscure file format details: http://fileformats.archiveteam.org/
The `shared-mime-info` database from freedesktop-dot-org is probably more worthy of contribution than these government-backed databases, at least in terms of number-of-end-users. New type definitions in their database will improve the entire Linux/BSD ecosystem (both desktop and server!) because it's consumed not only by fd.o's own `update-mime-database` utility but by many language-specific type-identification libraries too https://gitlab.freedesktop.org/xdg/shared-mime-info/-/blob/m...
…including (shameless plug) the new Ractor-based Ruby type library I've been working on in the wake of the `mimemagic` drama earlier this year: https://github.com/okeeblow/DistorteD/tree/NEW%E2%80%85SENSA...
-
Building a Personal Website in 2021
I considered switching to Hugo a while back but ended up sticking with Jekyll for the extensibility, like a few others have said. I'm particularly interested in image thumbnailing and format conversion since so many of my posts are image-heavy. I often found that to be my biggest and most frequent barrier to writing since plain text only goes so far on the modern web.
In Hugo, every solution I've seen uses a custom shortcode or custom Markdown image template-rendering hook along with Hugo's build-in image resizing. Many posts even suggest converting image resources to different formats with an external tool. It does seem like the image handling situation in Hugo is improving since it just gained WebP processing support in addition to JPEG: https://gohugo.io/news/0.83.0-relnotes/
Jekyll plugins offer way more power than a shortcode ('include' in Jekyll-speak) can, like how the author of the OP is using Jekyll-Picture-Tag. I've been working on my own similar plugin to handle converting and embedding my site's images, videos, SVGs, PDFs, text files, fonts and weird retro computer formats, etc. For example I can embed an SVG using standard Markdown syntax like `` and get a tag with the SVG plus rasterized JPG+WebP+AVIF+whatever at multiple sizes all totally seamlessly: https://github.com/okeeblow/DistorteD
Very happy the existence of Hugo lit a fire under the Jekyll team to work on speed though :)
-
Zola, A fast static site generator in a single binary
> It works at first, but you end up wanting to design your own custom SSG once you run up against something that goes against your mental model of how things should work.
There is a middle ground. I hit this point in Jekyll when I wanted Insanely Great image thumbnailing that no extant Jekyll plugin could provide, ended up writing my own tool to do that, but didn't want to duplicate the rest of Jekyll's functionality too. It's kiiinda hacky and I probably should propose the interface changes upstream if I keep doing this, but a very light monkey-patch lets my tool pretend to be a Jekyll::StaticFile that just happens to write out many separate files: https://github.com/okeeblow/DistorteD/blob/master/DistorteD-...
tika-docker
- Hosted app to manage server inventory
- Best FOSS (ideally Docker) that can split PDF files ?
-
Document Parsing - an unsolved problem?
At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding.
-
Any SW recommendation to index any kind of file in a External Drive?
See https://tika.apache.org/ for more.
-
Hey y'all back again w/ the personal, self-hosted search engine
For document content I've heard good things about Apache Tika. Spyglass could leverage it via the rest api.
-
Ask HN: How to extract information from mutiple (unstructured text) documents?
- Apache Tika (https://tika.apache.org/)
-
KnowledgeCanvas-0.5.3: Grid view sorting, filtering, pagination, and customization!
The short answer: Apache Tika!
- Fun with File Formats
-
Selfhosted File Management Solution? - tags, searching, etc
I installed FileRun recently and that might get you close. It's fast and the search is pretty good as it can integrate Apache Tika, I like the OnlyOffice integration as well. It's closed-source, which isn't great for me, but you get 3 accounts without having to pay.
-
Encoding detection
Any native or FFI callable thing like the java tools such as Apache Tika? A quick duckduckgo search didn't turn up anything for me. Tika has served me well in the past, but I have no idea what I'd use with CL.
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
sist2 - Lightning-fast file system indexer and search tool
vaultwarden - Unofficial Bitwarden compatible server written in Rust, formerly known as bitwarden_rs
yew - Rust / Wasm framework for building client web apps
spyglass - A personal search engine, crawl & index websites/files you want with a simple set of rules
tauri - Build smaller, faster, and more secure desktop applications with a web frontend.
raspbian-nspawn-64 - Bootable RPi4 / RPi3 image with 64-bit kernel, 32-bit Raspbian Buster host OS, 64-bit Debian Buster guest OS in nspawn container
self-hosted_docker_setups - A collection of my docker-compose files used to setup self-hosted services on Raspberry Pi 4 running 64-bit Raspberry Pi OS
SteamKit - SteamKit2 is a .NET library designed to interoperate with Valve's Steam network. It aims to provide a simple, yet extensible, interface to perform various actions on the network.
moonsharp - Enhanced MoonSharp for improved Tabletop Simulator mod development
server - self-hosted tag-based time tracking
spacedrive - Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.