Python Annotations

Open-source Python projects categorized as Annotations

Top 23 Python Annotation Projects

  • labelImg

    LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.

    Project mention: labelImg: NEW Data - star count:20925.0 | /r/algoprojects | 2023-11-21
  • label-studio

    Label Studio is a multi-type data labeling and annotation tool with standardized output format

    Project mention: FLaNK Stack Weekly for 14 Aug 2023 | | 2023-08-14
  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at

  • labelme

    Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

    Project mention: labelme VS anylabeling - a user suggested alternative | | 2023-04-15
  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • diffgram

    The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

  • entity-recognition-datasets

    A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

    Project mention: Recent English newswire NER datasets? | /r/LanguageTechnology | 2023-08-27

    There is of course the list at, but all of the recent English datasets cover other domains of English, such as the music NER, space NER, etc. All interesting things, but not 2020s English newswire.

  • refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

    Project mention: [P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions. | /r/MachineLearning | 2023-03-03

    You definitely forgot :)

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • projects

    🪐 End-to-end NLP workflows from prototype to production (by explosion)

    Project mention: Identify custom labels as well as existing labels with Spacy v3 | /r/LanguageTechnology | 2023-03-12

    When I was doing the same task, I used their `spacy project` command-line interface and extended their `ner_drugs` project, made things pretty easy.

  • eggnog-mapper

    Fast genome-wide functional annotation through orthology assignment

  • mypy_boto3_builder

    Type annotations builder for boto3 compatible with VSCode, PyCharm, Emacs, Sublime Text, pyright and mypy.

  • Encord Active

    Open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.

    Project mention: We tried injecting hallucinogenics into vision models | | 2023-11-30
  • bbox-visualizer

    Make drawing and labeling bounding boxes easy as cake

  • remarks

    Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG

    Project mention: How is the PDF reading experience after 3.4 update? | /r/RemarkableTablet | 2023-06-08

    In reMarkable's stock output, highlighted text is not textured with PDF annotations, and so its highlights are not readable by any PDF client. You would still need to use third-party software for that. The only two I know of are RCU and remarks.

  • spectree

    API spec validator and OpenAPI document generator for Python web frameworks.

    Project mention: Flask is Great! | /r/flask | 2023-02-04

    See Spectree for 1-4 for Flask, Flask also allows async if not see Quart and Quart-Schema. 6. It is not faster than Flask for production apps - only micro benchmarks.

  • pandas-stubs

    Pandas type stubs. Helps you type-check your code. (by VirtusLab)

  • kobuddy

    Kobo database backup and parser: extracts notes, highlights, reading progress and more

    Project mention: Does Kobo have... | /r/kobo | 2023-02-21

    You can also install on your computer which will do backups for you instead of Calibre.

  • gum

    Repository for the Georgetown University Multilayer Corpus (GUM) (by amir-zeldes)

    Project mention: Évariste Galois | | 2023-11-27

    CC BY SA 3.0:

    I didn't know about that project, that's really cool! I'd be curious to know whether the person who devised this scheme was aware of structured meaning representations (UCCA, AMR, ...), and if so, why they chose to create a new meaning representation. Maybe the goals of the project and/or the constraints of Wikidata necessitated this.

    Anyway, GUM (and its sister corpus EWT) does have a lot of parsed permissively-licensed text, so whoever's in charge should definitely consider using them. (Amir, the maintainer, is also super friendly and would respond to an email.)

  • infer-types

    A CLI tool to automatically add type annotations into Python code. Must have tool for annotating existing code.

    Project mention: infer-types: A CLI tool to automatically add type annotations into Python code. Must have tool for annotating existing code. | /r/coding | 2023-02-07
  • pdf-highlights

    Export your PDF highlights to markdown files.

  • YOLO-Coco-Dataset-Custom-Classes-Extractor

    Get specific classes from the Coco Dataset with annotations for the Yolo Object Detection model for building custom object detection models.

  • cocojson

    Utility scripts for COCO json annotation format

  • valdec

    Decorator for validating function arguments and result

  • astypes

    Python library to infer types for AST nodes. Make the most powerful Python linters and formatters!

    Project mention: astypes: Python library to infer types for AST nodes. Make the most powerful Python linters and formatters! | /r/coding | 2022-12-12
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-11-30.

Python Annotations related posts


What are some of the best open-source Annotation projects in Python? This list will help you:

Project Stars
1 labelImg 21,043
2 label-studio 14,871
3 labelme 11,436
4 cleanlab 7,268
5 diffgram 1,757
6 entity-recognition-datasets 1,410
7 refinery 1,279
8 projects 1,183
9 eggnog-mapper 489
10 mypy_boto3_builder 437
11 Encord Active 378
12 bbox-visualizer 358
13 remarks 313
14 spectree 294
15 pandas-stubs 117
16 kobuddy 116
17 gum 77
18 infer-types 66
19 pdf-highlights 23
20 YOLO-Coco-Dataset-Custom-Classes-Extractor 23
21 cocojson 17
22 valdec 12
23 astypes 6
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives