article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts (by scrapinghub)
go-trafilatura
go-trafilatura is a Go port of the trafilatura Python library. (by markusmobius)
article-extraction-benchmark | go-trafilatura | |
---|---|---|
1 | 1 | |
242 | 32 | |
5.8% | - | |
0.0 | 7.9 | |
almost 3 years ago | 10 months ago | |
Python | HTML | |
MIT License | GNU General Public License v3.0 only |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
article-extraction-benchmark
Posts with mentions or reviews of article-extraction-benchmark.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-03-30.
go-trafilatura
Posts with mentions or reviews of go-trafilatura.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-03-30.
What are some alternatives?
When comparing article-extraction-benchmark and go-trafilatura you can also consider the following projects:
unclutter - A modern reader mode and article library for your browser.
go-domdistiller - Go-DomDistiller is a Go port of the DOM Distiller library which implements Reader mode in Chrome for Android and Desktop. It has no dependencies on Chromium and is meant to run as a command line program or on a server.
dom-distiller - Distills the DOM
go-dateparser - go parser for human readable dates ported from the dateparser python package
arc90-readability - A copy of the original Arc90 repo with links to many of the current ports.
go-htmldate - CLI and Go package for extracting publication date of a web pages.
soup-strainer - A reimplementation of the Readability/Decruft algorithm using BeautifulSoup and html5lib
article-extraction-benchmark vs unclutter
go-trafilatura vs unclutter
article-extraction-benchmark vs go-domdistiller
go-trafilatura vs dom-distiller
article-extraction-benchmark vs go-dateparser
go-trafilatura vs go-domdistiller
article-extraction-benchmark vs arc90-readability
go-trafilatura vs arc90-readability
article-extraction-benchmark vs dom-distiller
go-trafilatura vs go-dateparser
article-extraction-benchmark vs go-htmldate
go-trafilatura vs soup-strainer