Top 23 information-extraction Open-Source Projects

PaddleNLP

2 11,386 9.8 Python

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
MITIE

0 2,892 0.0 C++

MITIE: library and tools for information extraction
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
DeepKE

2 2,891 9.4 Python

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Project mention: Would this method work to increase the memory of the model? Saving summaries generated by a 2nd model and injecting them depending on the current topic. | /r/LocalLLaMA | 2023-06-09

InvoiceNet

4 2,382 3.9 Python

Deep neural network to extract intelligent information from invoice documents.
kor

8 1,501 7.4 Python

LLM(😽)

Project mention: Pydentic in prompt engineering | /r/LangChain | 2023-11-29

Check out kor

awesome-document-understanding

4 1,108 4.5

A curated list of resources for Document Understanding (DU) topic
007-TheBond

4 1,030 6.6 Python

This Script will help you to gather information about your victim or friend.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ail-framework

2 495 9.6 Python

AIL framework - Analysis Information Leak framework

Project mention: Ask HN: Show me your half baked project | news.ycombinator.com | 2023-10-12

First time coming across this, looks very cool! Definitely some ideas there that I'd like to implement for osintbuddy. Another project I'm going to be taking some ideas from is: https://github.com/ail-project/ail-framework - a modular framework to analyse potential information leaks

RomBuster

1 422 6.4 Python

RomBuster is a router exploitation tool that allows to disclosure network router admin password.
medaCy

1 412 0.0 Python

:hospital: Medical Text Mining and Information Extraction with spaCy
MedCAT

1 407 8.5 Python

Medical Concept Annotation Tool
awesome-bioie

1 300 2.1

🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)

Project mention: Snomed CT Entity Linking Challenge | news.ycombinator.com | 2023-12-22

> The objective of this competition is to link spans of text in clinical notes with specific topics in the SNOMED CT clinical terminology. Participants will train models based on real-world doctor's notes which have been de-identified and annotated with SNOMED CT concepts by medically trained professionals. This is the largest publicly available dataset of labelled clinical notes, and you can be one of the first to use it!
NER: Named Entity Recognition: https://en.wikipedia.org/wiki/Named-entity_recognition
awsome-medical-coding-nlp: https://github.com/acadTags/Awesome-medical-coding-NLP
awesome-ehr-deep-learning: https://github.com/hurcy/awesome-ehr-deeplearning
awesome-ner: https://github.com/smiyawaki0820/awesome-ner
awesome-bioie > Research groups: https://github.com/caufieldjh/awesome-bioie#groups-active-in...
SNOMED-CT as RDF: https://sphn-semantic-framework.readthedocs.io/en/latest/ext...

GoLLIE

1 204 9.6 Python

Guideline following Large Language Model for Information Extraction

Project mention: A LLM trained to follow annotation guidelines, for information extraction tasks | news.ycombinator.com | 2023-10-30

awesome-hungarian-nlp

3 205 3.2

A curated list of NLP resources for Hungarian
huspacy

3 147 8.6 Python

HuSpaCy: industrial-strength Hungarian natural language processing
htmldate

1 106 7.6 Python

Fast and robust date extraction from web pages, with Python or on the command-line
minie

1 88 0.0 Java

An open information extraction system that provides compact extractions
targetedSummarization

3 86 1.8 Python

TextReducer - A Tool for Summarization and Information Extraction
stargather

3 67 1.8 Go

A fast GitHub stargazers information gathering tool
odinson

3 66 4.5 Scala

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.

Project mention: [D] Finetuning for text extraction (e.g. scientific sources) | /r/MachineLearning | 2023-06-11

Odinson

KIE_invoice_minimal

1 49 0.0 Python

Key information extraction from invoice document with Graph Convolution Network
IRCP

1 44 8.7 Python

A robust information gathering tool for large scale reconnaissance on Internet Relay Chat servers 🛰️ (by internet-relay-chat)

Project mention: IRCP: A robust information gathering tool for large scale reconnaissance on Internet Relay Chat servers | /r/netsec | 2023-06-07

AdaKGC

1 16 7.7 Python

[EMNLP 2023 (Findings)] Schema-adaptable Knowledge Graph Construction

Project mention: Schema-adaptable Knowledge Graph Construction | /r/BotNewsPreprints | 2023-05-16

Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema. As a result, such approaches fall short when applied to dynamic scenarios or domains, whereas a new type of knowledge emerges. This necessitates a system that can handle evolving schema automatically to extract information for KGC. To address this need, we propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training. We first split and convert existing datasets based on three principles to build a benchmark, i.e., horizontal schema expansion, vertical schema expansion, and hybrid schema expansion; then investigate the schema-adaptable performance of several well-known approaches such as Text2Event, TANL, UIE and GPT-3. We further propose a simple yet effective baseline dubbed AdaKGC, which contains schema-enriched prefix instructor and schema-conditioned dynamic decoding to better handle evolving schema. Comprehensive experimental results illustrate that AdaKGC can outperform baselines but still have room for improvement. We hope the proposed work can deliver benefits to the community. Code and datasets will be available in https://github.com/zjunlp/AdaKGC.

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

information-extraction related posts

Pydentic in prompt engineering
1 project | /r/LangChain | 29 Nov 2023
27-Jun-2023
1 project | /r/dailyainews | 29 Jun 2023
Guidance on creating a very lightweight model that does one task very well
2 projects | /r/LocalLLaMA | 26 Jun 2023
Kor: Extract structured data using LLMs
1 project | /r/hypeurls | 26 Jun 2023
Kor: Extract structured data using LLMs
1 project | news.ycombinator.com | 26 Jun 2023
Google Local Results AI Parser
1 project | news.ycombinator.com | 24 Jun 2023
Ruby gem to parse structured data from Google Local Search Results
1 project | news.ycombinator.com | 22 Jun 2023
A note from our sponsor - WorkOS
workos.com | 23 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source information-extraction projects? This list will help you:

	Project	Stars
1	PaddleNLP	11,386
2	MITIE	2,892
3	DeepKE	2,891
4	InvoiceNet	2,382
5	kor	1,501
6	awesome-document-understanding	1,108
7	007-TheBond	1,030
8	ail-framework	495
9	RomBuster	422
10	medaCy	412
11	MedCAT	407
12	awesome-bioie	300
13	GoLLIE	204
14	awesome-hungarian-nlp	205
15	huspacy	147
16	htmldate	106
17	minie	88
18	targetedSummarization	86
19	stargather	67
20	odinson	66
21	KIE_invoice_minimal	49
22	IRCP	44
23	AdaKGC	16