Advice building model for web elements/ browsing specific site

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

web2text

2 162 0.0 HTML

Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18

The only paper and code I’m aware of is in Scala and called https://github.com/dalab/web2text. They originally used a CNN. I think their training data was way to small.

pix2struct

5 540 4.4 Python

It is related to document AI. Recently google has released a model pix2struct. Some of the tasks they considered and datasets they used include:

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
blackmaria

6 150 4.9 Python

Python package for webscraping in Natural language

I have also seen several tools that try to use LLM to do web scraping. I didn't look into the details. https://www.reddit.com/r/MachineLearning/comments/12v0vda/p_i_built_a_tool_that_autogenerates_scrapers_for/ https://github.com/Smyja/blackmaria

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

GitHub - Smyja/blackmaria: Python package for webscraping in Natural language

1 project | /r/Python | 7 Apr 2023
Black Maria is a Python package that does web scraping with GPT and natural language

1 project | /r/u_waynerad | 2 Apr 2023
This Week in Python

5 projects | dev.to | 18 Mar 2022
replit discord.py why does line 6 print false?? logging in seems ok but bot doesn't respond

1 project | /r/discordbots | 1 Jun 2023
How can I code the desired discord bot?

1 project | /r/AskProgramming | 27 May 2023

Advice building model for web elements/ browsing specific site

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions
RESTful API Flask Python NLP
Post date: 3 May 2023

web2text

pix2struct

InfluxDB

blackmaria

Related posts

GitHub - Smyja/blackmaria: Python package for webscraping in Natural language

Black Maria is a Python package that does web scraping with GPT and natural language

This Week in Python

replit discord.py why does line 6 print false?? logging in seems ok but bot doesn't respond

How can I code the desired discord bot?

Advice building model for web elements/ browsing specific site

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions RESTful API Flask Python NLP Post date: 3 May 2023

web2text

pix2struct

InfluxDB

blackmaria

Related posts

GitHub - Smyja/blackmaria: Python package for webscraping in Natural language

Black Maria is a Python package that does web scraping with GPT and natural language

This Week in Python

replit discord.py why does line 6 print false?? logging in seems ok but bot doesn't respond

How can I code the desired discord bot?

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions
RESTful API Flask Python NLP
Post date: 3 May 2023