estela
browserless
estela | browserless | |
---|---|---|
10 | 21 | |
154 | 7,893 | |
2.0% | 8.1% | |
8.1 | 9.8 | |
3 months ago | 7 days ago | |
Python | TypeScript | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
estela
-
Struggling to scrape specific website - any advice?
This solution is using requests, you can also do this in scrapy, and if you are planning to run more crawlers you can use estela which is a spider management solution.
-
How to run webs scraping script every 15 minutes
You may want to check out [estela](https://estela.bitmaker.la/docs/), which is a spider management solution, developed by [Bitmaker](https://bitmaker.la) that allows you to run [Scrapy](https://scrapy.org) spiders.
-
Deploying Scrapy Projects on the Cloud
We are currently running a closed beta of Bitmaker Cloud (free and unlimited). Bitmaker Cloud gives you easy management of scraping workloads via a web dashboard and API. Only Scrapy spiders are supported at the moment (additional languages/frameworks are on the roadmap). Bitmaker Cloud is powered by estela, an elastic web scraping cluster running on Kubernetes. estela is a modern alternative to proprietary platforms such as Scrapy Cloud, as well as OSS projects such as scrapyd. The source code of estela and estela-cli is available on Github.
-
What's new in the Webscraping Ecosystem ? from OxyCon 2022
Estela: A webscraping framework on to of Kubernetes, which manage scaling (by Breno Colom)
- estela, an OSS elastic web scraping cluster
- Show HN: estela, a modern elastic web scraping cluster
-
Ask HN: What are the best tools for web scraping in 2022?
We released estela for this and other purposes, check it out, maybe it will suit your needs:
https://github.com/bitmakerla/estela
Only Scrapy support atm, but additional scraping frameworks/language are on the roadmap. Would be good to know which ones to prioritize over others :-)
browserless
-
How and why we ripped our Open Source product apart for a full rebuild
The core product is managed, cloud hosted browsers. We run thousands at a time using AWS and DigitalOcean, for people to use with Puppeteer and Playwright scripts. Our container is also available to self deploy under an open-source license.
-
Self-hosted browserless.io alternative ?
You should search for "Puppeteer as a service", there are some projects on github that you could deploy such as https://github.com/browserless/chrome
-
Remote Server Compromised
So I recently installed ChangeDetectioIO on my server, it requires either selenium/standalone-chrome-debug:3.141.59 or browserless/chrome. I installed it with Selenium in a docker container since I noticed that it was running better than the browserless/chrome service.
-
Angular docker base image
I had a look to this one: https://github.com/browserless/chrome ... but it is not suitable for builds, e.g. set to production mode, user permissions and so on.
- browserless chrome (Web browser automation built for everyone)
- Ask HN: What are the best tools for web scraping in 2022?
-
Using changedetection.io (installed via pip, not docker). How do I set up "WebDriver Chrome/Javascript"
git clone https://github.com/browserless/chrome /opt/browserless
- How to automate PDF generation of dashboards/web pages with open-source web automation
- Starring your repo does not give you permission to spam me
What are some alternatives?
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
Dompdf - HTML to PDF converter for PHP
colly - Elegant Scraper and Crawler Framework for Golang
PHP-Proxy - Proxy Application built on php-proxy library ready to be installed on your server
wi-page - Rank Wikipedia Article's Contributors by Byte Counts.
Twitch-Drops-Bot - A Node.js bot that will automatically watch Twitch streams and claim drop rewards.
pup - Parsing HTML at the command line
browsershot - Convert HTML to an image, PDF or string
linkedom - A triple-linked lists based DOM implementation.
selenoid - Selenium Hub successor running browsers within containers. Scalable, immutable, self hosted Selenium-Grid on any platform with single binary.
crawlee - Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
FPDI - FPDI is a collection of PHP classes facilitating developers to read pages from existing PDF documents and use them as templates in FPDF.