Grobid Alternatives

Similar projects and alternatives to grobid

pandoc

420 32,449 9.8 Haskell grobid VS pandoc

Universal markup converter
txtai

356 7,033 9.3 Python grobid VS txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Zettlr

116 9,640 9.9 TypeScript grobid VS Zettlr

Your One-Stop Publication Workbench
zeal

100 11,083 7.9 C++ grobid VS zeal

Offline documentation browser inspired by Dash
apexcharts.js

34 13,858 9.3 JavaScript grobid VS apexcharts.js

📊 Interactive JavaScript Charts built on SVG
paperai

19 1,198 5.9 Python grobid VS paperai

📄 🤖 Semantic search and workflows for medical/scientific papers
Parsr

7 5,656 4.6 JavaScript grobid VS Parsr

Transforms PDF, Documents and Images into Enriched Structured Data
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
paperetl

12 316 6.3 Python grobid VS paperetl

📄 ⚙️ ETL processes for medical and scientific papers
angle-grinder

10 3,363 3.7 Rust grobid VS angle-grinder

Slice and dice logs on the command line
CERMINE

1 476 0.0 Java grobid VS CERMINE

Content ExtRactor and MINEr
science-parse

1 571 0.0 Java grobid VS science-parse

Science Parse parses scientific papers (in PDF form) and returns them in structured form.
Smile

9 5,925 9.8 Java grobid VS Smile

Statistical Machine Intelligence & Learning Engine
llmsherpa

6 943 6.6 Jupyter Notebook grobid VS llmsherpa

Developer APIs to Accelerate LLM Projects
examples

6 381 6.8 Jupyter Notebook grobid VS examples

Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc. (by towhee-io)
datahub

34 9,230 9.9 Java grobid VS datahub

The Metadata Platform for your Data Stack
Tribuo

15 1,226 4.8 Java grobid VS Tribuo

Tribuo - A Java machine learning library
aleph

4 1,952 9.4 JavaScript grobid VS aleph

Search and browse documents and data; find the people and companies you look for. (by alephdata)
s2orc-doc2json

1 304 2.3 Python grobid VS s2orc-doc2json

Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
Deep Java Library (DJL)

13 3,853 9.5 Java grobid VS Deep Java Library (DJL)

An Engine-Agnostic Deep Learning Framework in Java
nlm-ingestor

3 810 7.1 Python grobid VS nlm-ingestor

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better grobid alternative or higher similarity.

Suggest an alternative to grobid

grobid reviews and mentions

Posts with mentions or reviews of grobid. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-05-06.

FLaNK-AIM Weekly 06 May 2024
45 projects | dev.to | 6 May 2024
Show HN: Open-source Rule-based PDF parser for RAG
9 projects | news.ycombinator.com | 23 Jan 2024
How to ingest image based PDFs into private GPT model?
2 projects | /r/aipromptprogramming | 27 Jun 2023
🥪 Best Sites For ebooks, articles, research papers etc..🥪
4 projects | /r/RockMods | 17 May 2023
Grobid – ML software for extracting information from scholarly documents
1 project | news.ycombinator.com | 21 Apr 2023
How to create a web app that turns academic papers into text documents
1 project | /r/webdev | 16 Jan 2023

Interesting concept. Grobid tries to do the same https://github.com/kermitt2/grobid
Extract research paper`s references
1 project | /r/LanguageTechnology | 1 Jan 2023

I would suggest using grobid - a pipeline for extracting scientific PDFs into a common XML format which can be easily parsed. Grobid has quite a nice mature REST API that I've used in some of my own projects. It parses references and matches them to their DOI using the CrossRef API with a reported 95% F1 score. This should make your job pretty simple as far as I can tell - all you'd need to do is run your papers through grobid and then build a citation graph by comparing document DOIs.
Free/open-source alternatives to Connected Papers...?
2 projects | /r/opensource | 12 Aug 2022
Seeking Advice: How to extract Abstract from scientific journals (.pdfs) 10k+.
5 projects | /r/LanguageTechnology | 3 Jun 2022

Just use science-parse or GROBID. They have been designed for that exact reason.
Project to rebuild papers with plaintext markup languages
7 projects | /r/Open_Science | 25 Sep 2021

- I ended up using Grobid, which converts the PDF to a very detailed XML format. The format is not a word processing format though, but a format specifically for representing scientific documents. I don't know, if it would, for example, contain tags about bold or italicized text. The tool is working really well, but since you probably cannot use the output XML format directly, it will need some postprocessing, which would be relatively simple with XML parsing libraries.
A note from our sponsor - InfluxDB
www.influxdata.com | 7 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Stats

Basic grobid repo stats

Mentions

Stars

3,075

Activity

9.2

Last Commit

6 days ago

kermitt2/grobid is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of grobid is Java.

Popular Comparisons