[P] I built a tool that auto-generates scrapers for any website with GPT

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • llm_data_parser

    This is a proof-of-concept of using an LLM to find and extract meaningful data without parsing the html too much.

  • I actually tried doing this with langchain and gpt-3 and upload it to github a week ago, you can find it here, https://github.com/repollo/llm_data_parser Is really crappy right now because I only wanted to show to rpilocator.com’s owner it was possible, since he’s having to go through each spider/scraper and update it every time a website gets modified. But really cool to see a whole platform for this very purpose! Would be cool to see support for multiple libraries, and programming languages!

    You'll want to check out https://github.com/cavi-au/Consent-O-Matic

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts