-
llmwhisperer-table-extraction
Extract tables from PDFs using LLMWhisperer and extract structured information from those tables using Langchain
The source code for the project can be found here on Github. To successfully run the extraction script, you’ll need 2 API keys. One for LLMWhisperer and the other for OpenAI APIs. Please be sure to read the Github project’s README to fully understand OS and other dependency requirements. You can sign up for LLMWhisperer, get your API key, and process up to 100 pages per day free of charge.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Pydantic: Use Pydantic to declare your data model. This output parser allows users to specify an arbitrary Pydantic Model and query LLMs for outputs that conform to that schema.
-
With API deployments you can expose an API to which you send a PDF or an image and get back structured data in JSON format. Or with an ETL deployment, you can just put files into a Google Drive, Amazon S3 bucket or choose from a variety of sources and the platform will run extractions and store the extracted data into a database or a warehouse like Snowflake automatically. Unstract is an Open Source software and is available at https://github.com/Zipstack/unstract.