seeV
unilm
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
seeV
-
A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images?
I also wrote a Swift CLI that wraps over the Vision framework: https://github.com/nexuist/seev
Text extraction is included (including the ability to specify custom words not found in the dictionary) but there are also utilities for face detection, classification, etc.
unilm
-
A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images?
Has anyone tried Kosmos [0] ? I came across it the other day and it looked shiny and interesting, but I haven't had a chance to put it to the test much yet.
[0] - https://github.com/microsoft/unilm/tree/master/kosmos-2.5
- Kosmos-2.5: A Multimodal Literate Model
- 1-Bit LLMs Could Solve AI's Energy Demands
- GPUs Go Brrr
- The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf]
- The Era of 1-Bit LLMs: Training Tips, Code and FAQ
-
The Era of 1-bit LLMs: ternary parameters for cost-effective computing
+1 On this, the real proof would have been testing both models side-by-side.
It seems that it may be published on GitHub [1] according to HuggingFace [2].
[1] https://github.com/microsoft/unilm/tree/master/bitnet
[2] https://huggingface.co/papers/2402.17764
- I'm an Old Fart and AI Makes Me Sad
-
On building a semantic search engine
e5-mistral is essentially a distillation from gpt-4 to a smaller model. You can see here https://github.com/microsoft/unilm/blob/16da2f193b9c1dab0a69...
they actually have custom prompts for each dataset being tested.
Question would be, if you haven't seen the task before, what is a good prompt to prepend for your task?
IMO e5-mistral is overfit to MTEB
-
Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide
Layout LM v1, v2 and v3 models [ Github ] DocBERT [ Github ]
What are some alternatives?
magic - Scanner for decks of cards with bar codes printed on card edges
involution - [CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator
LlamaChat - Chat with your favourite LLaMA models in a native macOS app
NExT-GPT - Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
doctr - docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
OpenLLM - Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.