Our great sponsors
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Background: I have been working on a project to perform OCR on mobile screenshots in my current company. Due to many quality and deployment issues, I could not go with tesseract; then, I found EasyOCR, which seemed reasonable at initial testing; however, further investigation revealed that it does not generalize that well. See the issue I created here I created.
The Project - Model: The primary architecture consists of a CNN with a transformer encoder and decoder. At first, I used my implementation of self-attention. Still, due to it not converging, I switched to using x-transformer implementation by lucidrains - as it includes improvements from many papers. The objective is simple; the CNN encoder converts images to a high-level representation; feeds them to the transformer encoder for information flow. Finally, a transformer decoder tries to decode the text character-by-character using autoregressive loss. After two weeks of trying around different things, the training did not converge within the first hour - as this is the usual mark I use to validate if a model is learning or not.
Related posts
- Will Transformers Take over Artificial Intelligence?
- [R] Rotary Positional Embeddings - a new relative positional embedding for Transformers that significantly improves convergence (20-30%) and works for both regular and efficient attention
- How to Cluster Images
- x-transformers
- FREE AI Course By Microsoft: ZERO to HERO! 🔥