textract-cli
Jina AI examples
textract-cli | Jina AI examples | |
---|---|---|
2 | 22 | |
6 | 403 | |
- | - | |
3.3 | 9.6 | |
7 months ago | over 2 years ago | |
Python | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
textract-cli
-
Show HN: Search PDFs with Transformers and Python Notebook
That's really neat: https://github.com/mbafford/textract-cli/blob/master/textrac... - my tool had to involve an S3 bucket too just because Textract won't let you upload a PDF directly to it without storing it in S3 first.
Jina AI examples
-
Show HN: Search PDFs with Transformers and Python Notebook
- Modern PDFs - if you wanna extract text and images, then the PDFSegmenter used in my example will work. If tables too, might need some additional jiggery-pokery, but definitely doable. I know other ppl using the same framework (Jina) who've accomplished it.
- Exact word search - pretty simple. I've focused on more advanced stuff because color vs colour is same same but different. Also just because it's pretty easy since I'm just using pre-defined building blocks, not manually integrating stuff
- Cross platform frontend - I've seen a lyrics search frontend [0] and I've built stuff in Streamlit before. Jina offers RESTful/gRPC/WebSockets gateways so it can't be too tough
- Lightweight? I mean how lightweight do you want it? C? Bash? Assembly? I've found Python good for text parsing
- Long-term: The notebook I wrote has a few (each of which have their own), but compared to others they're relatively lightweight.
- Gluing code: I've been using pre-existing building blocks, and writing new Executors (i.e. building blocks) is relatively straightforward, and then scaling them up with shards, replicas, etc is just a parameter away.
I'm more into the search side then the PDF stuff. The PDF side I've had experience with through bitter suffering and torment. Not a fun format to work with (unless you're into sado-masochism)
[0] https://github.com/jina-ai/examples/tree/master/multires-lyr...
-
Getting started with Jina AI
Semantic Wikipedia Search
- Do what Google does: build a semantic search app powered by Jina AI's open source, neural search framework.
- A semantic search app powered by Jina AI's open source, neural search framework. Using this, you can index and search song lyrics using state-of-the-art machine learning language models
-
[P] A week ago, I came across this super cool project to build Cross Modal Search. I will now share more details about the project
I was looking for some projects based on search engines, and building a tool which could search across various types of data, and that's when I came across this GitHub project: https://github.com/jina-ai/jina/blob/master/.github/pages/hello-world.md#-multimodal-document-search. Encouraged by thorough, step by step instructions on how to build a search service that can use diverse modal features to provide accurate results; I ventured through the documents till I came to the latest updated version, here: https://github.com/jina-ai/examples/tree/master/cross-modal-search.
- Build your own Google Image search powered by deep-learning, open-source
-
[P] Open-source Neural Search framework to implement semantic search & multimedia search. Just released 2.0, seeking your feedback.
There are already some examples on music search, pdf search and video search that shows some POC of it's capabilities around those use cases. You can discuss your specific use case in detail with Jina community on slack
-
I was wrong! A big thank you to r/python members 🙏
Thank you so much for the appreciation and sharing your use cases. Checkout examples for chatbot and financial analysis - https://github.com/jina-ai/examples
-
PDF search - Another project I built using Jina(AI Search framework)
git clone --depth 1 --filter=blob:none --sparse https://github.com/jina-ai/examples git sparse-checkout set multimodal-search-pdf
- Alternative to Google Images - Open-Source image search engine
What are some alternatives?
pdfminer.six - Community maintained fork of pdfminer - we fathom PDF
finetuner - :dart: Task-oriented embedding tuning for BERT, CLIP, etc.
jina - ☁️ Build multimodal AI applications with cloud-native stack
jina-hub - An open-registry for hosting Jina executors via container images
jina-financial-qa-search
jina-app-store-example - App store search example, using Jina as backend and Streamlit as frontend [Moved to: https://github.com/jina-ai/example-app-store]
jina-meme-search-example - Meme search engine built with Jina neural search framework. Search with captions or image files to find matching memes. [Moved to: https://github.com/jina-ai/example-meme-search]
docs - Jina V1 Official Documentation
Ansika - Hassle-Free Engineer Onboarding
videos - This is my video documentation. Here you'll find code-snippets, technical documentation, templates, command reference, and whatever is needed for all my YouTube Videos.