Top 5 Python robots-txt Projects
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
the-great-gpt-firewall
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
-
-
sitemap_grabber
A python library to recursively crawl every sitemap.xml for a website. Also handles robots.txt and other well-knowns.
Project mention: Ask HN: What Are You Working On? (October 2024) | news.ycombinator.com | 2024-10-27I've been unwinding my side-projects into their component parts and publishing them as their own python packages. Some examples:
1. https://github.com/simplecto/django-reference-implementation -- My personal production-ready Django boilerplate. "There are many like it, but this one is mine"
2. https://github.com/simplecto/sitemap_grabber -- A python library to recursively crawl every sitemap.xml for a website. Also handles robots.txt and other well-knowns.
3. https://github.com/heysamtexas/django-oauth2-capture -- A Django app to capture OAuth2 tokens for non-authentication purposes, enabling your application to act on behalf of users across external platforms like GitHub, LinkedIn, and X (Twitter)
I'm also taking popular and helpful software and wrapping them in RESTful apis as part of a larger api project I call the JOAT (Jack Of All Trades).
4. https://github.com/heysamtexas/REST-headless-browser -- Playwright headless browser wrapped in a FastAPI REST application, running inside a docker container
Python robots-txt discussion
Python robots-txt related posts
Index
What are some of the best open-source robots-txt projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | advertools | 1,160 |
2 | gflare-tk | 159 |
3 | the-great-gpt-firewall | 79 |
4 | AI-Data-Guard | 3 |
5 | sitemap_grabber | 0 |