-
ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
I bet you could get this working in https://github.com/comfyanonymous/ComfyUI
I have done some other LLava stuff in it
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practical
multimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2
-
checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practical
multimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2
-
I also wrote a Swift CLI that wraps over the Vision framework: https://github.com/nexuist/seev
Text extraction is included (including the ability to specify custom words not found in the dictionary) but there are also utilities for face detection, classification, etc.
-
Has anyone tried Kosmos [0] ? I came across it the other day and it looked shiny and interesting, but I haven't had a chance to put it to the test much yet.
[0] - https://github.com/microsoft/unilm/tree/master/kosmos-2.5
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives