tidy-html5
hn-search
tidy-html5 | hn-search | |
---|---|---|
9 | 1,625 | |
2,663 | 524 | |
0.3% | 0.2% | |
0.0 | 2.9 | |
8 days ago | 6 months ago | |
C | TypeScript | |
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tidy-html5
- Show HN: I made a tool to clean and convert any webpage to Markdown
- Localize HTML Tidy (README.md)
-
libtidy, compilation errors
So I included the tidy libraries in my project.
-
Searching for the *old* W3C XHTML/CSS validator or something of equivalent functionality
Maybe look into HTML Tidy. It's job is to clean up HTML and convert legacy code to modern form, so it knows about DTDs. You might be able to pass it some options to get what you want.
-
Converting a IETM delivered in HTML to XML S1000D 4.0.
I've always used tidy for HTML/XML formatting jobs.
-
Expand one very long HTML line (>30k characters) as multi-line formatted indented HTML?
Personally I use command that switches the file type to html, and then formats it with tidy. It assumes you're pasting into a new buffer.
-
Unminify HTML in terminal
I use tidy.
-
Inspecting the Clipboard (on Linux)
So I installed HTML tidy.
-
The most underused browser feature
Prune instructs the parser to remove any elements within the extracted article block that look superfluous. This can result in false positives, so we tend to disable it when we've gone to the trouble of creating site-specific extraction rules.
Tidy determines if the source HTML should be cleaned up first with HTML Tidy - https://github.com/htacg/tidy-html5. If you're parsing the source HTML with an HTML 5 parser, as we are now, it shouldn't be necessary any more (I think we actually ignore it now). We used it more before when we relied on libxml parsing, which often trips up on modern HTML.
hn-search
-
Validating app for manufacturers enhancing process reliability and efficiency
I was looking for it in the guidelines. There are a couple of conventions for postings. Consider a bit of prior examples: [https://hn.algolia.com/?q=show+hn]
-
Show HN: Hacker Search – A semantic search engine for Hacker News
yeah there are only three stories coming up from the site search
https://hn.algolia.com/?q=postgres+clustering
only one is semanthically correct, the other pick up the wrong version of clustering (i.e. k-means instead of multi master writes)
but yeah if one doesn't test the hard cases, how does one know it preserves semantics :D
- Longevity of Recordable CDs, DVDs and Blu-Rays
-
The Scientific Method Part 5: Illusions, Delusions, and Dreams
Like dismissing the work of Feyerabend or Wittgenstein without seemingly having read either:
https://hn.algolia.com/?dateRange=pastMonth&page=0&prefix=tr...
-
Any Google Analytics Alternatives?
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
-
Russian GRU was behind the attack in Vrbětice, NCOZ confirms
If it's not [flagged], there's no flagging and hence also no flagging ring. baybal2 has been banned on and off for years now https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
-
Gary Killdall, creator of CP/M, wrote Pixar's original 3D renderer [pdf]
The submitted title was "Gary Killdall, creator of CP/M, wrote Pixar's original 3D renderer".
Submitters: If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
(From https://news.ycombinator.com/newsguidelines.html: "Please use the original title, unless it is misleading or linkbait; don't editorialize.")
-
Nearsightedness is at epidemic levels – and the problem begins in childhood
Vision therapy for myopia helps some people, but not everyone, likely due to genetic and neuroplasticity differences, https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu.... Nevertheless, many of the principles are useful for children whose eyes and brains are still developing.
-
Tesla driver arrested for homicide after running over motorcyclist on Autopilot
I'm a huge Tesla skeptic, but Tesla and Musk are lightning rods for tabloid-style garbage that doesn't belong on HN, so it doesn't surprise me that we often see negative Tesla content flagged to death. Meanwhile we also see plenty of content that hits the front page and stays there [0].
Do you have examples of professional, interesting Tesla content that got flagged?
[0] More than half of the past year's most popular Tesla articles were negative: https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...
-
The Man Who Killed Google Search
It's April 23rd, 2024, and I am still looking for a good, reliable, honest and simple search engine.
All I want to do is search.
No AI.
No ads.
No shopping.
Please don't "Answer my question." I enjoy doing my own original research, thanks.
I'm entirely willing - wanting even - to pay for it.
Currently Kagi has my $, but I'm saddened and frustrated that they're not even focused on Search, they're focused on AI[1] and t-shirts.
Amazingly, in 2024, there is still a market opportunity for a good search engine.
It can't really just be me, can it?
[1]: https://hn.algolia.com/?query=%22kagi%22+%22ai%22
What are some alternatives?
parser - 📜 Extract meaningful content from the chaos of a web page
duckduckgo-locales - Translation files for <a href="https://duckduckgo.com"> </a>
readability.php - PHP port of Mozilla's Readability.js
v - Simple, fast, safe, compiled language for developing maintainable software. Compiles itself in <1s with zero library dependencies. Supports automatic C => V translation. https://vlang.io
readability - Readability is a library written in Go (golang) to parse, analyze and convert HTML pages into readable content. Originally an Arc90 Experiment, it is now incorporated into Safari’s Reader View.
toltec - Community-maintained repository of free software for the reMarkable tablet.
readability - A standalone version of the readability lib
SponsorBlock - Skip YouTube video sponsors (browser extension)
yq - Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
ftr-site-config - Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
milkdown - 🍼 Plugin driven WYSIWYG markdown editor framework.