corpus-tools

Open-source projects categorized as corpus-tools
Language: + Python + Macaulay2

Top 4 corpus-tool Open-Source Projects

  • trafilatura

    Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

  • Project mention: Claude is now available in Europe | news.ycombinator.com | 2024-05-14
  • bitextor

    Bitextor generates translation memories from multilingual websites

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ua-gec

    UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

  • simplemma

    Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Index

What are some of the best open-source corpus-tool projects? This list will help you:

Project Stars
1 trafilatura 2,977
2 bitextor 282
3 ua-gec 255
4 simplemma 128

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com