What Does Copyright Say about Generative Models?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • PD-Diffusion

  • >But how much of a song or a painting can you reproduce?

    The reason why fair use is vague is specifically to confuse people who ask these kinds of questions. The Supreme Court needed a tool that artists could use to legally smack down people who republish fragments of other people's work, but didn't want to abolish the 1st Amendment in the process. So basically judges have the final say as to whether or not something is novel creativity or in debt to the original. Any hard-and-fast rule beyond "binding precedent applies" is effectively copyright abolition by degrees.

    >We lost most of Elizabethan theater because there was no copyright. [..] Without some kind of protection, authors had no interest in publishing at all, let alone publishing accurate texts.

    This is a dated example, if only because creative works leave a lot more evidence now than they used to. People today will act to preserve art against the artists own wishes and at great personal risk.

    >and it’s easy to suspect that the actual payments will be similar to the royalties musicians get from streaming services: microcents per use

    Given the amount of data these systems need (read: more than humanity can provide) I'd say microcents is arguably too high. Remember that you can't actually derive a clear chain of value between one particular training set entry and one particular execution of the model. It's all chucked into a blender that runs on almost-linear algebra and calculus. At best you can detect if parts of the image resemble specific training set examples[0] and pay people slightly more if the model regurgitates training set data.

    Let's also keep in mind that a good chunk of the licensing system is based on being able to say no to specific users, or write very tailor-made licensing agreements for specific works or conditions. That's still going to be threatened, even if we can pay sub-Spotify-tier royalties every time a model trains itself on your work.

    >It is easy to imagine an AI system that has been trained on the (many) Open Source and Creative Commons licenses.

    Working on it: https://github.com/kmeisthax/PD-Diffusion

    The thing is, we already have a good database of reusable, public-domain, no-attribution-necessary images; it's called Wikimedia Commons. I really can't fathom why OpenAI didn't start there, other than just an assumption that they were entitled to larger datasets or a feeling that they could get established before anyone sued.

    Even then, OpenAI already tried this with computer code and they're getting sued for it anyway, because they never bothered with attribution in the case of training set regurgitation.

    [0] This is possible because part of the prompt guidance process involves a thing called CLIP which can do both image and text classification in the same coordinate system.

  • SNKRX

    A replayable arcade shooter where you control a snake of heroes.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts