linkedom
estela
Our great sponsors
linkedom | estela | |
---|---|---|
13 | 10 | |
1,463 | 153 | |
- | 3.9% | |
8.1 | 8.1 | |
30 days ago | 3 months ago | |
HTML | Python | |
ISC License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
linkedom
-
Alternative for DOMParser for background script (Service worker) in manifest v3?
linkedom is your answer
-
Task: Save Article to Markdown
linkedom - to parse HTML into a workable DOM. I used to use jsdom, but I switch for performance reasons.
-
Ask HN: What are the best tools for web scraping in 2022?
For simple scraping where the content is fairly static, or when performance is critical, I will use linkedom to process pages.
https://github.com/WebReflection/linkedom
When the content is complex or involves clicking, Playwright is probably the best tool for the job.
https://github.com/microsoft/playwright
-
The Fetch API is finally coming to Node.js
I recently started using linkedom for this and it has been an absolute joy
- LinkeDOM: A Jsdom Alternative (2021)
-
Happy-DOM: a jsdom alternative that can server side render web components
This looks great. I wonder how it compares to linkedom (repo[1], writeup[2]), which I have found to be fantastic.
[1]: https://github.com/WebReflection/linkedom
[2]: https://webreflection.medium.com/linkedom-a-jsdom-alternativ...
-
Testing Solid.js code beyond jest
linkedom, fastest, but lacks essential features
-
Using Mocha to test ClojureScript
Other things to do would be to use linkedom instead of JSDom, look into a better assertion library than assert etc.
-
Idiosyncrasies of the HTML Parser
Sounds somewhat similar to linkedom[1], which performs nicely.
[1]: https://github.com/WebReflection/linkedom
- LinkeDOM – A triple-linked lists based DOM“ [Live with Andrea Giammarchi]
estela
-
Struggling to scrape specific website - any advice?
This solution is using requests, you can also do this in scrapy, and if you are planning to run more crawlers you can use estela which is a spider management solution.
-
How to run webs scraping script every 15 minutes
You may want to check out [estela](https://estela.bitmaker.la/docs/), which is a spider management solution, developed by [Bitmaker](https://bitmaker.la) that allows you to run [Scrapy](https://scrapy.org) spiders.
-
Deploying Scrapy Projects on the Cloud
We are currently running a closed beta of Bitmaker Cloud (free and unlimited). Bitmaker Cloud gives you easy management of scraping workloads via a web dashboard and API. Only Scrapy spiders are supported at the moment (additional languages/frameworks are on the roadmap). Bitmaker Cloud is powered by estela, an elastic web scraping cluster running on Kubernetes. estela is a modern alternative to proprietary platforms such as Scrapy Cloud, as well as OSS projects such as scrapyd. The source code of estela and estela-cli is available on Github.
-
What's new in the Webscraping Ecosystem ? from OxyCon 2022
Estela: A webscraping framework on to of Kubernetes, which manage scaling (by Breno Colom)
- estela, an OSS elastic web scraping cluster
- Show HN: estela, a modern elastic web scraping cluster
-
Ask HN: What are the best tools for web scraping in 2022?
We released estela for this and other purposes, check it out, maybe it will suit your needs:
https://github.com/bitmakerla/estela
Only Scrapy support atm, but additional scraping frameworks/language are on the roadmap. Would be good to know which ones to prioritize over others :-)
What are some alternatives?
happy-dom - A JavaScript implementation of a web browser without its graphical user interface
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
HTMLKit - An Objective-C framework for your everyday HTML needs.
colly - Elegant Scraper and Crawler Framework for Golang
wpt - Test suites for Web platform specs — including WHATWG, W3C, and others
undetected-chromedriver - Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
haste-perch - Create dynamic HTML easy in the browser using declarative notation
crawlee - Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
vite - Next generation frontend tooling. It's fast!
pup - Parsing HTML at the command line
jsdom - A JavaScript implementation of various web standards, for use with Node.js
scrapyd - A service daemon to run Scrapy spiders