chrome-aws-lambda
estela
Our great sponsors
chrome-aws-lambda | estela | |
---|---|---|
12 | 10 | |
3,136 | 153 | |
- | 3.9% | |
0.0 | 8.1 | |
11 months ago | 3 months ago | |
TypeScript | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
chrome-aws-lambda
-
Lambdas vs EC2
Lambda would be my choice for this. You could even stay within the free tier depending how often you run this process. You can orchestrate puppeteer UI flows in lambda using this package https://github.com/alixaxel/chrome-aws-lambda My team does this and it works great.
-
Building a PDF Generator using AWSÂ Lambda
git clone --depth=1 https://github.com/alixaxel/chrome-aws-lambda.git && \ cd chrome-aws-lambda && \ make chrome_aws_lambda.zip
- Best way to scrape header + image from articles on scale?
- Ask HN: What are the best tools for web scraping in 2022?
- Is it possible to use functions requiring a GPU in a serverless google cloud function?
-
Dynamic Open Graph images with Next.js
When requesting the API route, the Next.js serverless function will actually spin up a web browser on the server (a headless instance of Chromium, using chrome-aws-lambda). Next, a webpage will be generated with HTML we can define ourselves. This HTML will be used to construct the image. That means that as a developer we can generate images using HTML and CSS, technologies we are already familiar with!
-
How we keep our Serverless deploy times short and avoid headaches
This plugin is used for all our AWS Lambda deployments, using a wide range of Node modules, some with more quirks than others. We use it together with Lambda Layer Sharp and Chrome AWS Lambda.
-
How to create a chrome profile programmatically in aws lambda?
I was able to successfully to run chrome with puppeteer in AWS Lambda for a similar use case. I used an "optimized" version of chrome packaged as an AWS Lambda Layer.
-
Create PDF documents with AWS Lambda + S3 with NodeJS and Puppeteer
git clone --depth=1 https://github.com/alixaxel/chrome-aws-lambda.git && \ cd chrome-aws-lambda && \ make chrome_aws_lambda.zip
-
chrome binary not found aws lambda
Simplest method use ]Puppeteer](https://blog.risingstack.com/pdf-from-html-node-js-puppeteer/) with chrome-aws-lambda.
estela
-
Struggling to scrape specific website - any advice?
This solution is using requests, you can also do this in scrapy, and if you are planning to run more crawlers you can use estela which is a spider management solution.
-
How to run webs scraping script every 15 minutes
You may want to check out [estela](https://estela.bitmaker.la/docs/), which is a spider management solution, developed by [Bitmaker](https://bitmaker.la) that allows you to run [Scrapy](https://scrapy.org) spiders.
-
Deploying Scrapy Projects on the Cloud
We are currently running a closed beta of Bitmaker Cloud (free and unlimited). Bitmaker Cloud gives you easy management of scraping workloads via a web dashboard and API. Only Scrapy spiders are supported at the moment (additional languages/frameworks are on the roadmap). Bitmaker Cloud is powered by estela, an elastic web scraping cluster running on Kubernetes. estela is a modern alternative to proprietary platforms such as Scrapy Cloud, as well as OSS projects such as scrapyd. The source code of estela and estela-cli is available on Github.
-
What's new in the Webscraping Ecosystem ? from OxyCon 2022
Estela: A webscraping framework on to of Kubernetes, which manage scaling (by Breno Colom)
- estela, an OSS elastic web scraping cluster
- Show HN: estela, a modern elastic web scraping cluster
-
Ask HN: What are the best tools for web scraping in 2022?
We released estela for this and other purposes, check it out, maybe it will suit your needs:
https://github.com/bitmakerla/estela
Only Scrapy support atm, but additional scraping frameworks/language are on the roadmap. Would be good to know which ones to prioritize over others :-)
What are some alternatives?
terraform-aws-next-js - Terraform module for building and deploying Next.js apps to AWS. Supports SSR (Lambda), Static (S3) and API (Lambda) pages.
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
puppeteer - Node.js API for Chrome
colly - Elegant Scraper and Crawler Framework for Golang
chrome-aws-lambda-layer - 58 MB Google Chrome to fit inside AWS Lambda Layer compressed with Brotli
wi-page - Rank Wikipedia Article's Contributors by Byte Counts.
serverless-webpack - Serverless plugin to bundle your lambdas with Webpack
pup - Parsing HTML at the command line
lambda-layer-sharp - An AWS Lambda Layer for the Sharp node module. Automatically published on updates.
linkedom - A triple-linked lists based DOM implementation.
serverless-graphql - Serverless GraphQL Examples for AWS AppSync and Apollo
crawlee - Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.