Show HN: Beyond text splitting – improved file parsing for LLM's

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • open-parse

    Improved file parsing for LLM’s

  • surya

    OCR, layout analysis, reading order, line detection in 90+ languages

  • This looks great! You might be interested in surya - https://github.com/VikParuchuri/surya (I'm the author). It does OCR (much more accurate than tesseract), layout analysis, and text detection.

    The OCR is slow on CPU (working on it), but faster than tesseract (CPU-only) on GPU.

    Happy to discuss more, feel free to email me (in profile).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • unitable

    UniTable: Towards a Unified Table Foundation Model

  • is this the unitable you mentioned https://github.com/poloclub/unitable

  • deepdoctection

    A Repo For Document AI

  • https://github.com/deepdoctection/deepdoctection

    Have you tried this ?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • DeepDoctection: Document extraction and analysis using deep learning models

    1 project | /r/programming | 27 Apr 2023
  • DeepDoctection: Document extraction and analysis using deep learning models

    1 project | /r/patient_hackernews | 26 Apr 2023
  • DeepDoctection: Document extraction and analysis using deep learning models

    1 project | /r/hackernews | 26 Apr 2023
  • DeepDoctection

    1 project | /r/hypeurls | 26 Apr 2023
  • Show HN: Surya – OCR and line detection in 93 languages

    1 project | news.ycombinator.com | 13 Feb 2024