-
parsee-datasets
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
To test this, we created 3 different datasets, all based on the same selection of 1,156 randomly selected annual reports for the year 2023 of publicly listed US companies.
The resulting (fully labeled) datasets contain a combined total of 10,404 rows, 37,536,847 tokens and 1,156 images and can be found on Github and Huggingface: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...
For our study, we are evaluating 8 state-of-the-art (M)LLMs on a subset of 100 reports with some interesting results.