SaaSHub helps you find the best software and product alternatives Learn more →
Tarsier Alternatives
Similar projects and alternatives to tarsier
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
AutoCrawler
Official implement of paper "AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation"
-
datadoubleconfirm
Simple datasets and notebooks for data visualization, statistical analysis and modelling - with write-ups here: http://projectosyo.wix.com/datadoubleconfirm.
-
Data-extraction-and-text-analysis
The objective of this assignment is to extract textual data articles from the URL and perform text analysis to compute variables.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
tarsier reviews and mentions
-
ScrapeGraphAI: Web scraping using LLM and direct graph logic
Agreed!
Apify's Website Content Crawler[0] does a decent job of this for most websites in my experience. It allows you to "extract" content via different built-in methods (e.g. Extractus [1]).
We currently use this at Magic Loops[2] and it works _most_ of the time.
The long-tail is difficult though, and it's not uncommon for users to back out to raw HTML, and then have our tool write some custom logic to parse the content they want from the scraped results (fun fact: before GPT-4 Turbo, the HTML page was often too large for the context window... and sometimes it still is!).
Would love a dedicated tool for this. I know the folks at Reworkd[3] are working on something similar, but not sure how much is public yet.
[0] https://apify.com/apify/website-content-crawler
[1] https://github.com/extractus/article-extractor
[2] https://magicloops.dev/
[3] https://reworkd.ai/
- Control the browser using GPT-4 vision by AgentGPT team
- Show HN: GPT-4 vision utilities to browse the web
-
A note from our sponsor - SaaSHub
www.saashub.com | 3 Jun 2024
Stats
reworkd/tarsier is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of tarsier is Jupyter Notebook.
Sponsored