[R] A new dataset and a library that you can use for ML and RL over the Web

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

product-page-dataset

3 42 0.0

1) We've open-sourced a dataset of about 50k labeled product web pages from roughly 8000 distinct e-commerce merchants, available as MHTML and WebTraversalLibrary clones (see next point :) ), along with the corresponding screenshots. Not all of the MHTMLs render correctly, but the ones that do also have screenshots in a corresponding dataset for CV applications. You can find documentation regarding how to download these datasets (as well as some example code) here. You can read about the dataset (more statistics, biases, labelling procedure, challenges etc.) and find some initial benchmarks we've run in this pre-print: https://arxiv.org/abs/2111.02168

webtraversallibrary

4 65 0.0 HTML

The Web Traversal Library (WTL) is a Python library for abstracting web interactions on top of a base execution layer such as Selenium.

2) If interacting with the Web is more your thing, you can also check out the WebTraversalLibrary which you can use to easily script agents that interact with the Internet via a browser. This library provides extremely useful abstractions so that you don't have to worry about writing the code to interact with the low-level implementations of the browser at all (it abstracts the browser up to a state/action level so all you have to do is worry about the RL part). You can find quite a few example scripts in the repo.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[D] Datasets and Models for Structured Information Extraction on HTML

3 projects | /r/MachineLearning | 31 May 2022
Web Scraping in a professional setting: Selenium vs. BeautifulSoup

2 projects | /r/Python | 26 Oct 2021
What is the most interesting / funniest solution you have seen done with Python & Selenium?

3 projects | /r/Python | 14 Sep 2021

[R] A new dataset and a library that you can use for ML and RL over the Web

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
klarna
Post date: 25 Apr 2022

product-page-dataset

webtraversallibrary

InfluxDB

Related posts

[D] Datasets and Models for Structured Information Extraction on HTML

Web Scraping in a professional setting: Selenium vs. BeautifulSoup

What is the most interesting / funniest solution you have seen done with Python & Selenium?

[R] A new dataset and a library that you can use for ML and RL over the Web

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning klarna Post date: 25 Apr 2022

product-page-dataset

webtraversallibrary

InfluxDB

Related posts

[D] Datasets and Models for Structured Information Extraction on HTML

Web Scraping in a professional setting: Selenium vs. BeautifulSoup

What is the most interesting / funniest solution you have seen done with Python &amp; Selenium?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
klarna
Post date: 25 Apr 2022

What is the most interesting / funniest solution you have seen done with Python & Selenium?