obsidian-omnisearch vs OCRmyPDF

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

surveyjs.io

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

obsidian-omnisearch		OCRmyPDF
	Project
17	Mentions	77
997	Stars	12,067
-	Growth	2.2%
8.9	Activity	9.5
20 days ago	Latest Commit	9 days ago
TypeScript	Language	Python
GNU General Public License v3.0 only	License	Mozilla Public License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

obsidian-omnisearch

Posts with mentions or reviews of obsidian-omnisearch. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-04.

"Your account will be permanently closed" -- my reasons for leaving Evernote as a loyal user since 2011
8 projects | /r/Evernote | 4 Sep 2023

Mobile Document Scanning using QuickScan iOS app and OCR search with Omnisearch and Text Extractor: I was a power user of the Scannable app by Evernote for capturing scans of receipts and documents, so moving on from this was going to be tough. But QuickScan has the same functionality (OCR scanning) and has quick outputs to where my scans are stored in my Obsidian folder. Using Omnisearch, searching my scans feels just as intuitive and snappy as what Evernote used to feel like for me.
Obsidian-Copilot: A Prototype Assistant for Writing and Thinking
5 projects | news.ycombinator.com | 13 Jun 2023

In the past I have used Omnisearch which I have found to be an improvement.
https://github.com/scambier/obsidian-omnisearch
Tip: Use an Obsidian folder to store your ChatGPT threads
1 project | /r/ObsidianMD | 27 May 2023

Combine this with my favorite Obsidian search plugin Omnisearch and you end up making this bunch of random chat threads useful - now I can link and tag across, and source them for new ideas.
Using Github to write my notes has helped me retain knowledge immensely.
14 projects | /r/learnprogramming | 9 Mar 2023

The Omnisearch plugin might be what you need. No AI but weighted results depending on where your query words are found (filename, titles, frequency...). It works well for me, it's my primary way to find notes.
Why do you think Obsidian is better than the alternatives?
1 project | /r/ObsidianMD | 5 Mar 2023

The tag system works well for GTD workflows and organization in general. Default search isn't the best but the Omnisearch plugin fixes that.
Search & Omnisearch frustrations - prioritizing exact matches over fuzzy search?
1 project | /r/ObsidianMD | 1 Dec 2022

Also - and speaking about plugins in general - the best way to get an issue resolved is to ask it on the GitHub page. If the plugin is maintained, its developer will usually gladly help you solve your problem :) https://github.com/scambier/obsidian-omnisearch/issues
Is there a way to search for a word or phrase just in the current note?
1 project | /r/ObsidianMD | 27 Nov 2022

I think Obsidian Omnisearch can help you with that.
Perfect note taking and information organizing solution - does it exist ?
2 projects | /r/apple | 22 Nov 2022

The Omnisearch plug-in for Obsidian does search in PDFs and images via OCR.
Digitalizing 10 years of handwritten notes -- how would you go about doing it?
1 project | /r/ObsidianMD | 20 Nov 2022
PDF notes in Obsidian with Zotero
1 project | /r/ObsidianMD | 16 Oct 2022

In my opinion it is absolutely possible. The developer of the Omnisearch plugin now works on PDF indexing - https://github.com/scambier/obsidian-omnisearch/releases/tag/1.6.5-beta.3.

OCRmyPDF

Posts with mentions or reviews of OCRmyPDF. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-14.

TextSnatcher: Copy text from images, for the Linux Desktop
7 projects | news.ycombinator.com | 14 Mar 2024

Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.
FLaNK Stack Weekly 19 Feb 2024
50 projects | dev.to | 19 Feb 2024
Calibre – New in Calibre 7.0
11 projects | news.ycombinator.com | 18 Nov 2023

I recommend running any such PDFs through OCRmyPDF.
https://github.com/ocrmypdf/OCRmyPDF
A better document viewer
1 project | /r/linux4noobs | 13 Sep 2023

If by "like a photocopy" you mean the file contains images of text rather than text, the MacOS viewer presumably does OCR on the images. I don't know if there's a Linux document viewer with that capability built-in, but a quick search turned up the standalone tool OCRmyPDF.
Gibts ein (CLI) tool, das Kontrast und Helligkeit von gescannten Textdokumenten dynamisch anpasst?
3 projects | /r/de_EDV | 27 Jun 2023
OCR for a full pdf on Neoreader
1 project | /r/Onyx_Boox | 25 Jun 2023

For anyone interested I solved the problem by first ocr files through the free and open source software ocrmypdf avaible here
ELI5: why is PDF such a widespread text format, instead of a format that's actually easier to edit?
1 project | /r/explainlikeimfive | 3 Jun 2023

ocrmypdf is nice for stuff like that.
Donut: OCR-Free Document Understanding Transformer
4 projects | news.ycombinator.com | 29 May 2023
massive crop and OCR newspaper
3 projects | /r/macapps | 16 May 2023

Use imagemagick to convert them to PDF and ocrmypdf to straighten and OCR. See this explanation.
OCR pdf and just keep the OCR text
1 project | /r/AskTechnology | 6 May 2023

Fair enough, maybe this might work for you, it should seperate the text from image anyway and if you have Adobe acrobat it should be able delete the background too with the edit function. It may already be able to do that if you haven't tried it

What are some alternatives?

When comparing obsidian-omnisearch and OCRmyPDF you can also consider the following projects:

obsidian-switcher-plus - Enhanced Quick Switcher plugin for Obsidian.md

PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

obsidian-customizable-sidebar - This Plugin allows you to add every Command to Obsidian's Sidebar Ribbon and add Custom Icons.

pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

cMenu-Plugin - An Obsidian.md plugin that adds a minimal text editor modal for a smoother writing/editing experience ✍🏽.

tesserocr - A Python wrapper for the tesseract-ocr API

ObsidianCustomFrames - An Obsidian plugin that turns web apps into panes using iframes with custom styling. Also comes with presets for Google Keep, Todoist and more.

Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents

remotely-save - Yet another unofficial Obsidian plugin allowing users to synchronize notes between local device and the cloud service. Supports S3, Dropbox, OneDrive, webdav.

invoice2data - Extract structured data from PDF invoices

minisearch - Tiny and powerful JavaScript full-text search engine for browser and Node

pdfminer.six - Community maintained fork of pdfminer - we fathom PDF