Go Text processing

Open-source Go projects categorized as Text processing

Top 23 Go Text processing Projects

  • GitHub repo micro-editor

    A modern and intuitive terminal-based text editor

    Project mention: Which text editor is best suitable for new beginner in Linux? | reddit.com/r/linux4noobs | 2021-06-14

    Command Line editor = nano is hugely common but I'd advise getting micro as it's far better than nano, IMO. https://micro-editor.github.io/

  • GitHub repo GoQuery

    A little like that j-thing, only in Go.

    Project mention: Building Golang crawler with Docker | reddit.com/r/golang | 2021-03-12

    RUN go get github.com/PuerkitoBio/goquery

  • GitHub repo blackfriday

    Blackfriday: a markdown processor for Go

    Project mention: Compounding Competence | dev.to | 2021-04-11

    On the backend when generating the emails: For this, I chose a popular Go markdown library BlackFriday.

  • GitHub repo sh

    A shell parser, formatter, and interpreter with bash support; includes shfmt (by mvdan)

    Project mention: Config to edit bash scripts with fancy LSP features, linting and formatting | reddit.com/r/vim | 2021-06-17

    Does anybody have such? Maybe you could share your experience? I use coc.nvim. My eyes fell on these 3 tools. The first one is language server and it has coc extensions coc-sh. But others are not so I am not sure which vim plugin should I use to hook them up: besides diagnostic-languageserver there are syntastic and neomake - bash-language-server - shellcheck - shfmt

  • GitHub repo toml

    TOML parser for Golang with reflection. (by BurntSushi)

    Project mention: GOPROXY alternative for non go modules | reddit.com/r/golang | 2021-04-06

    There are packages such as https://github.com/BurntSushi/toml which is not a go module, how should I serve it in an airlocked network? For go modules I'm using athens is there something similar to it for non go modules?

  • GitHub repo go-humanize

    Go Humans! (formatters for units to human friendly sizes)

  • GitHub repo bluemonday

    bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

    Project mention: HTML Sanitizer API | news.ycombinator.com | 2021-05-06

    My thoughts as a maintainer of a HTML sanitizer https://github.com/microcosm-cc/bluemonday

    1. Sanitizing is not difficult, defining the policy/config is difficult as your need is not someone else's. First glance of this proposal is that this needs a lot more work to cover people's needs. It's good enough, but will have a lot of edges and will need to evolve.

    2. If you allow a blocklist then people will use that by default as it's easier to say "I don't want " than it is to say "I only accept 3. Even if you sanitize something you should keep the raw input... you should store the raw input alongside the sanitized (in fact the sanitized is merely a cached version of the raw input having been sanitized). The reason for this is you will have issues you need to debug (and can't without the input) and you will have round-trip edits you should support (but it's not round-trippable when everything you return is different from the input, do not punish a user who pasted HTML thinking it was safe by then not allowing them to edit it out because you threw everything away). Additionally if you want to ever report on the input, i.e. topK values, and you've modified the input and not kept raw, then you can never do this.

    4. Provide a sane default. Most engineers simply do not know what is safe or not. I ship a policy in bluemonday for user generated content... it is safe by default and good enough for most people, and it can be taken and extended due to the way the API is structured so can cover other scenarios as a foundation policy.

    I think the proposal in general: specify a standard for a sanitization API has merit. But mostly it has merit if it specifies a standard for defining sanitization policies/configuration, allowing them to be portable across different languages and systems.

    The one I wrote is very heavily inspired by https://github.com/owasp/java-html-sanitizer which is the OWASP project one maintained by Mike Samuel. When I did my research before writing the Go one, this was far and away the best way to construct the policy/config and I already saw that this perspective was more valuable than whether it's a token based parser (GIGO but low memory) or a DOM builder (more memory)... no-one cares about the internals, they care about expressing what safe means to them.

  • GitHub repo gofeed

    Parse RSS, Atom and JSON feeds in Go

    Project mention: Automatice el README para su perfil de GitHub con Go y GitHub Actions | dev.to | 2021-04-25
  • GitHub repo xurls

    Extract urls from text

  • GitHub repo commonregex

    🍫 A collection of common regular expressions for Go (by mingrammer)

  • GitHub repo slug

    URL-friendly slugify with multiple languages support.

  • GitHub repo whatlanggo

    Natural language detection library for Go

  • GitHub repo mxj

    Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

    Project mention: Golang json to xml, xlm to json | reddit.com/r/golang | 2021-03-17

    Hello, maybe someone had experience converting xlm to json and json to xlm without structs? I have found some libs like github.com/clbanning/mxj but it loses sequences, of course I could modify xlm to remove seq to pass validation etc. Ideally it should work like this: https://www.utilities-online.info/xmltojson#.W1cSCNIzZPY

  • GitHub repo Dataflow kit

    Extract structured data from web sites. Web sites scraping.

  • GitHub repo Koazee

    A StreamLike, Immutable, Lazy Loading and smart Golang Library to deal with slices.

  • GitHub repo gographviz

    Parses the Graphviz DOT language in golang

  • GitHub repo xpath

    XPath package for Golang, supports HTML, XML, JSON document query.

  • GitHub repo htmlquery

    htmlquery is golang XPath package for HTML query.

    Project mention: XPath package for HTML Query, No third-party library dependencies | reddit.com/r/golang | 2020-12-30
  • GitHub repo go-runewidth

    wcwidth for golang

  • GitHub repo omniparser

    omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

    Project mention: A (streaming) text parser supports many formats like EDI, JSON, fixed-lenght, csv, XML etc. | reddit.com/r/golang | 2021-03-28
  • GitHub repo gotext

    Go (Golang) GNU gettext utilities package

  • GitHub repo gotabulate

    Gotabulate - Easily pretty-print your tabular data with Go

  • GitHub repo go-edlib

    Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-06-17.

Index

What are some of the best open-source Text processing projects in Go? This list will help you:

Project Stars
1 micro-editor 17,115
2 GoQuery 10,238
3 blackfriday 4,731
4 sh 3,807
5 toml 3,480
6 go-humanize 2,676
7 bluemonday 1,899
8 gofeed 1,647
9 xurls 793
10 commonregex 735
11 slug 655
12 whatlanggo 495
13 mxj 469
14 Dataflow kit 465
15 Koazee 462
16 gographviz 433
17 xpath 397
18 htmlquery 368
19 go-runewidth 366
20 omniparser 351
21 gotext 300
22 gotabulate 266
23 go-edlib 266
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com