[Discussion] Entity extraction + table extraction from documents (imaged-based, various layouts and quality)

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • shutter

    Stochastic image generator for annotated synthetic datasets (by Rainelz)

  • I mean 100% synthetic data. When you generate random layouts you know where you are placing the objects you are interested in detecting with your model (e.g. keys, values). If you have no common layout then I’d go creative and try to cover the most common! I have a repo I used to generate fake documents, it’s not maintained anymore but maybe you find it useful.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Viking 7B: open LLM for the Nordic languages trained on AMD GPUs

    1 project | news.ycombinator.com | 15 May 2024
  • Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

    3 projects | news.ycombinator.com | 15 May 2024
  • Building a Tic-Tac-Toe Game in Python: A Step-by-Step Guide

    1 project | dev.to | 15 May 2024
  • Show HN: Julep: A platform to manage memories, knowledge and tools for LLM apps

    2 projects | news.ycombinator.com | 14 May 2024
  • GPT-4o's Memory Breakthrough (Needle in a Needlestack)

    3 projects | news.ycombinator.com | 14 May 2024