Web scraping library

This page summarizes the projects mentioned and recommended in the original post on /r/haskell

Our great sponsors
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • Onboard AI - Learn any GitHub repo in 59 seconds
  • SaaSHub - Software Alternatives and Reviews
  • scalpel

    A high level web scraping library for Haskell. (by fimad)

    This may be of interest: https://github.com/fimad/scalpel

  • marlo

    a search engine for humans

    Check out my library lasercutter. It's good at scraping data out of trees. As-is, it's probably too low level for what you want, but there is an HTML tree parser and then set of combinators which I'd be happy to turn into a library if it were desired.

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • webdriver

    A Haskell client for the Selenium WebDriver protocol.

    Here's a slightly different solution which could work: this Haskell library for Selenium works fine - I've used it. You could navigate to the page using Selenium and whatever supported browser you like (Chrome, Firefox, Edge etc.) and then evaluate a Javascript snippet on the page, via the Selenium API, to retrieve the value you want. One potential advantage of this is it'll work on highly Javascript-dependent pages.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts