Content Parser – Extract Markdown, HTML or text from content-heavy websites

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • readability

    A standalone version of the readability lib

  • * [Readability](https://github.com/mozilla/readability) to strip down the page's HTML to a bare minimum.

  • to-markdown

    🛏 An HTML to Markdown converter written in JavaScript

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • puppeteer

    Node.js API for Chrome

  • * [Puppeteer](https://github.com/puppeteer/puppeteer) to download the page.

    It costs me only several cents to parse an entire page, and I think OP can make some money out of this if they get the pricing right.

    Some unsolicited feedbacks on the API:

  • clippy

    Opensource commandline webclipper. (by benprew)

  • I wrote something similar so I could save recipes and web pages for reading offline. And if you save in html, it will inline images, so you can have a single file. In markdown, it just creates a link.

    It also uses turndown and readability.

    It's pretty finicky (readability doesn't always identify the correct content or misses pieces of the content). If you want to charge for it, you'd have to fix some of those edge cases.

    Also, I don't think the value is this product is turning web pages into markdown, there are many free web clippers and archive sites that do this already. I see this as more of an "extra" in a product, like how Evernote has a web clipper built in to their note taking product.

    Also, it's cool to see other people care about a stripped down web reading experience too!

    https://github.com/benprew/clippy

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts