Advice building model for web elements/ browsing specific site

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • web2text

    Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18

  • The only paper and code I’m aware of is in Scala and called https://github.com/dalab/web2text. They originally used a CNN. I think their training data was way to small.

  • pix2struct

  • It is related to document AI. Recently google has released a model pix2struct. Some of the tasks they considered and datasets they used include:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • blackmaria

    Python package for webscraping in Natural language

  • I have also seen several tools that try to use LLM to do web scraping. I didn't look into the details. https://www.reddit.com/r/MachineLearning/comments/12v0vda/p_i_built_a_tool_that_autogenerates_scrapers_for/ https://github.com/Smyja/blackmaria

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • GitHub - Smyja/blackmaria: Python package for webscraping in Natural language

    1 project | /r/Python | 7 Apr 2023
  • Black Maria is a Python package that does web scraping with GPT and natural language

    1 project | /r/u_waynerad | 2 Apr 2023
  • This Week in Python

    5 projects | dev.to | 18 Mar 2022
  • replit discord.py why does line 6 print false?? logging in seems ok but bot doesn't respond

    1 project | /r/discordbots | 1 Jun 2023
  • How can I code the desired discord bot?

    1 project | /r/AskProgramming | 27 May 2023