Web scraping library

This page summarizes the projects mentioned and recommended in the original post on /r/haskell

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • scalpel

    A high level web scraping library for Haskell. (by fimad)

    This may be of interest: https://github.com/fimad/scalpel

  • marlo

    a search engine for humans

    Check out my library lasercutter. It's good at scraping data out of trees. As-is, it's probably too low level for what you want, but there is an HTML tree parser and then set of combinators which I'd be happy to turn into a library if it were desired.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • webdriver

    A Haskell client for the Selenium WebDriver protocol.

    Here's a slightly different solution which could work: this Haskell library for Selenium works fine - I've used it. You could navigate to the page using Selenium and whatever supported browser you like (Chrome, Firefox, Edge etc.) and then evaluate a Javascript snippet on the page, via the Selenium API, to retrieve the value you want. One potential advantage of this is it'll work on highly Javascript-dependent pages.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts