Go Text processing

Open-source Go projects categorized as Text processing | Edit details

Top 23 Go Text processing Projects

  • GitHub repo micro-editor

    A modern and intuitive terminal-based text editor

    Project mention: Batteries Included with Emacs | news.ycombinator.com | 2021-11-25
  • GitHub repo GoQuery

    A little like that j-thing, only in Go.

    Project mention: Building Golang crawler with Docker | reddit.com/r/golang | 2021-03-12

    RUN go get github.com/PuerkitoBio/goquery

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo blackfriday

    Blackfriday: a markdown processor for Go

    Project mention: Compounding Competence | dev.to | 2021-04-11

    On the backend when generating the emails: For this, I chose a popular Go markdown library BlackFriday.

  • GitHub repo sh

    A shell parser, formatter, and interpreter with bash support; includes shfmt (by mvdan)

    Project mention: Code formatter, linters, etc. Recommendations? | reddit.com/r/bash | 2021-09-29

    There is shellcheck, and shellharden which is a strict version of it. There are similar stuff here, some that also help with your editor. You can also use a docker version of shfmt. See here for a quick tutorial on shfmt.

  • GitHub repo toml

    TOML parser for Golang with reflection. (by BurntSushi)

    Project mention: Rust Moderation Team Resigns | news.ycombinator.com | 2021-11-22

    He's also a prominent contributor to the Go ecosystem.

    https://github.com/BurntSushi/toml

  • GitHub repo go-humanize

    Go Humans! (formatters for units to human friendly sizes)

  • GitHub repo bluemonday

    bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

    Project mention: HTML Sanitizer API | news.ycombinator.com | 2021-05-06

    My thoughts as a maintainer of a HTML sanitizer https://github.com/microcosm-cc/bluemonday

    1. Sanitizing is not difficult, defining the policy/config is difficult as your need is not someone else's. First glance of this proposal is that this needs a lot more work to cover people's needs. It's good enough, but will have a lot of edges and will need to evolve.

    2. If you allow a blocklist then people will use that by default as it's easier to say "I don't want " than it is to say "I only accept 3. Even if you sanitize something you should keep the raw input... you should store the raw input alongside the sanitized (in fact the sanitized is merely a cached version of the raw input having been sanitized). The reason for this is you will have issues you need to debug (and can't without the input) and you will have round-trip edits you should support (but it's not round-trippable when everything you return is different from the input, do not punish a user who pasted HTML thinking it was safe by then not allowing them to edit it out because you threw everything away). Additionally if you want to ever report on the input, i.e. topK values, and you've modified the input and not kept raw, then you can never do this.

    4. Provide a sane default. Most engineers simply do not know what is safe or not. I ship a policy in bluemonday for user generated content... it is safe by default and good enough for most people, and it can be taken and extended due to the way the API is structured so can cover other scenarios as a foundation policy.

    I think the proposal in general: specify a standard for a sanitization API has merit. But mostly it has merit if it specifies a standard for defining sanitization policies/configuration, allowing them to be portable across different languages and systems.

    The one I wrote is very heavily inspired by https://github.com/owasp/java-html-sanitizer which is the OWASP project one maintained by Mike Samuel. When I did my research before writing the Go one, this was far and away the best way to construct the policy/config and I already saw that this perspective was more valuable than whether it's a token based parser (GIGO but low memory) or a DOM builder (more memory)... no-one cares about the internals, they care about expressing what safe means to them.

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo gofeed

    Parse RSS, Atom and JSON feeds in Go

    Project mention: Automatice el README para su perfil de GitHub con Go y GitHub Actions | dev.to | 2021-04-25
  • GitHub repo xurls

    Extract urls from text

  • GitHub repo commonregex

    🍫 A collection of common regular expressions for Go (by mingrammer)

  • GitHub repo slug

    URL-friendly slugify with multiple languages support.

  • GitHub repo whatlanggo

    Natural language detection library for Go

    Project mention: Announcing Lingua 1.0.0: The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike | reddit.com/r/golang | 2021-06-21

    So far, the only other comprehensive open source library in the Go ecosystem for this task is Whatlanggo. Unfortunately, it has two major drawbacks:

  • GitHub repo mxj

    Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

    Project mention: Golang json to xml, xlm to json | reddit.com/r/golang | 2021-03-17

    Hello, maybe someone had experience converting xlm to json and json to xlm without structs? I have found some libs like github.com/clbanning/mxj but it loses sequences, of course I could modify xlm to remove seq to pass validation etc. Ideally it should work like this: https://www.utilities-online.info/xmltojson#.W1cSCNIzZPY

  • GitHub repo Dataflow kit

    Extract structured data from web sites. Web sites scraping.

  • GitHub repo Koazee

    A StreamLike, Immutable, Lazy Loading and smart Golang Library to deal with slices.

  • GitHub repo gographviz

    Parses the Graphviz DOT language in golang

  • GitHub repo xpath

    XPath package for Golang, supports HTML, XML, JSON document query.

  • GitHub repo htmlquery

    htmlquery is golang XPath package for HTML query.

    Project mention: XPath package for HTML Query, No third-party library dependencies | reddit.com/r/golang | 2020-12-30
  • GitHub repo go-runewidth

    wcwidth for golang

  • GitHub repo omniparser

    omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

    Project mention: A (streaming) text parser supports many formats like EDI, JSON, fixed-lenght, csv, XML etc. | reddit.com/r/golang | 2021-03-28
  • GitHub repo gotext

    Go (Golang) GNU gettext utilities package

  • GitHub repo go-edlib

    📚 String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

  • GitHub repo html-to-markdown

    ⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. (by JohannesKaufmann)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-25.

Go Text processing related posts

Index

What are some of the best open-source Text processing projects in Go? This list will help you:

Project Stars
1 micro-editor 18,380
2 GoQuery 10,844
3 blackfriday 4,824
4 sh 4,280
5 toml 3,679
6 go-humanize 2,921
7 bluemonday 2,124
8 gofeed 1,746
9 xurls 842
10 commonregex 782
11 slug 749
12 whatlanggo 512
13 mxj 489
14 Dataflow kit 480
15 Koazee 476
16 gographviz 454
17 xpath 435
18 htmlquery 423
19 go-runewidth 395
20 omniparser 393
21 gotext 309
22 go-edlib 287
23 html-to-markdown 282
Find remote jobs at our new job board 99remotejobs.com. There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com