Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I'm scraping about 30 sites for work at the moment, but have a few that are using Cloudflare which has been a b*tch to deal with. Tried numerous libraries and different proxy providers, but reliability is patchy. Previous fixes like https://github.com/Anorov/cloudflare-scrape don't seem to work anymore after Cloudflare updates, so I've switched to using a pretty optimised headless browser with good proxies instead.
You can do it as a service, but that is highly competitive and basically trading time for money. Best ways are to productize it:
- build a on-demand data api for a specific type of data and charge a premium for it. Good example is https://serpapi.com/ who do Google data, charge ~10X markup on proxy costs
- proxy solutions make good money. To scrape at scale you need proxies, and lots of users pay $1-5k per month. Lots of proxy solutions doing +$100k per month.
- build a tool that uses web scraped data, analyses/filters it and displays it to users. Lots of the biggest web scrapers are doing this, ex. doing product monitoring products for e-commerce companies, etc. Lots of competition there, but you can do it in new markets, like NFTs, etc.
- hedge funds will pay huge money for web data, if you have 5 years of continuous data so they can backtest it.