[D] Datasets and Models for Structured Information Extraction on HTML

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • If you get the chance to try it out, please post your findings in Issues section of the Github repository for the dataset, we're all very curious :)

  • webtraversallibrary

    The Web Traversal Library (WTL) is a Python library for abstracting web interactions on top of a base execution layer such as Selenium.

  • TL;DR: Dataset, pre-print with comparisons of NN architectures for the Web, useful library for Web manipulation

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [R] A new dataset and a library that you can use for ML and RL over the Web

    2 projects | /r/MachineLearning | 25 Apr 2022
  • Web Scraping in a professional setting: Selenium vs. BeautifulSoup

    2 projects | /r/Python | 26 Oct 2021
  • What is the most interesting / funniest solution you have seen done with Python & Selenium?

    3 projects | /r/Python | 14 Sep 2021