Who kept the bots out? Stopping content being harvested by AI

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • askai

    Command Line Interface for OpenAi ChatGPT (by yudax42)

  • AI-powered content generation has exploded in popularity recently, with bots like ChatGPT and Bard, but the giant amounts of data these bots require comes from harvesting the web. What if you don’t want your content feeding the bots? Some respect robots.txt, others notice a new ‘noai’ header tag.

  • img2dataset

    Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

  • The particular tool mentioned in the Vice article is Img2dataset, and right now, it doesn't pay attention to the robots.txt file, the normal mechanism you can use to dissuade well behaved bots from indexing your content. However, it does respect a new HTTP header directive, X-Robots-Tag: noai (and also noindex, though that's an existing and already well-known part of the robots.txt standard).

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts