Skip to Main Content
Arkansas State University

Copyright: Copyright and Artificial Intelligence

This guide is intended to give you basic information about copyright.

Citing AI

If you use AI-generated works in your papers or research, you need to acknowledge and most likely need to cite it!

Check out our AI Citation Guide!

Plagiarism and AI

Plagiarism is the stealing of someone else's work without crediting the source. In some cases, using AI in your assignments can meet the definition of plagiarism.

Learn more at our Plagiarism and AI Guide!

Artificial Intelligence

We use artificial intelligence everyday, often without knowing it. Artificial intelligence (AI) is computer programming and learning that simulates human intelligence and learning. Google, Siri, search engines, self-driving cars, automated stocks, and virtual assistants are examples of AI.

Recently, a new form of AI, called generative AI, has been developed. Generative AI can take raw data and make something new out of it. The most famous examples of generative AI are ChatGPT for text-based applications and Dall-E for image-based applications. When used as a tool, AI can be amazing. Students can use it to come up with paper topics, highlight relevant content, test their critical thinking skills, and help make existing content more accessible for those with disabilities. The key is making sure you're using AI as a tool and not reusing AI-generated work for your assignments.

Copyright and Artificial Intelligence

Since generative AI is still a fairly new (and amazing) concept, we are still learning how to incorporate it into work and education. One area that is still in flux is copyright.

According to the U.S. Copyright Office, copyright can only be awarded to works created by human beings. And so far, works generated by AI have been denied copyright. In cases where works are created by a combination of human and AI, the human-generated portion has been granted copyright and the AI-generated portion denied. For example, Kristina Kashtanova was awarded copyright for the authorship of Zarya of the Dawn while the copyright for the art created by the AI program Midjourney was denied since "generated material is not the product of human authorship.” This requirement for human authorship has so far been upheld in the courts.

Copyrighted Materials in AI Datasets

Perhaps the bigger question of AI and copyright involves the materials used to train the AI datasets. Generative AI programs require text and images to be "scraped" from existing resources and entered into their datasets. In some cases, companies are transparent as to where they get their work. In many cases though, we don't know where the scraped data comes from and evidence shows that scraped resources may come from copyrighted works without the original creator's permission or knowledge (or compensation). Additionally, it's easy to purposefully or accidentally prompt these programs into reusing other people's copyrighted works. So the "generated" content, may actually be someone else's.
 

Artificial Intelligence and Fair Use

In multiple cases, courts have declared the copying of copyrighted work for indexing purposes to be fair use -- for example, Google copying webpages to search them faster. Some proponents of artificial intelligence claim that the scraping of copyrighted content is similar and should also fall under fair use. However, these acts are done with different intentions. Indexing helps users find copyrighted material easier while AI-generated material competes with the copyrighted material. So until the courts or Copyright Office say different, fair use should not be assumed when scraping data for AI datasets.

Examples of Copyright Infringement in AI Datasets

In December 2023, The New York Times filed a lawsuit again ChatGPT parent company, OpenAI. Included in the lawsuit are 100 examples of ChatGPT "generating" responses nearly identical to New York Times articles. In the example below, the red text (approximately 97% of both articles) is text that ChatGPT created that directly matches an article in The New York Times.

Screenshot showing ChatGPT generated content nearly identical to a New York Times article.


In January 2024, Gary Marcus and Reid Southen demonstrated multiple instances of generative AI programs creating images that include, or are extremely similar, to copyrighted characters. For example, the prompt "yellow 3d cartoon character with goggles and overalls" in Midjourney created these Minion-like characters:

Screenshot of 4 yellow characters that appear extremely similar to Minions.