Wednesday, December 3, 2025

Is it plagiarism or aLLMost plagiarism?

From a discussion on a Linkein post by Aron Brand:

The question arose as to whether LLMs store "representations" of their training data, and if so, why is it not plagiarism to use those representations as they respond to users' prompts?

I think that's a very nuanced and insightful question that comes down, I suppose, to the definition of "stored representations."

At first blush, it seems obvious: Let's say an LLM ingests the post you're reading at this moment. For the sake of argument, assume it's not here on a blog, but in a book that I've published and copyrighted. Of course I've included the standard notice that "no part may be stored, transmitted, reproduced, etc. without written permission." The owner of the LLM has bought and owns a copy of my book.

Later, you ask it about my opinions on LLMs and plagiarism, and it summarizes what I've written. I allege that it has "stolen" the content and used it unlawfully, without my permission.

Has the LLM "stored" my content?

Thursday, July 10, 2025

An AI Apology

I'd been working late into the night on part of a large personal project. It was progressing slowly but steadily since early morning and I was learning tons. (Translation: I was spending more time fixing than creating, trying various approaches to accomplish each step, and discovering clever new ways to make mistakes. You know, the kind where you don't know whether it'll be harder to fix it, or to figure out why it didn't work in the first place.) To be fair, I don't have a lot of experience in this particular area, so I was exploring alternatives and poking at the options that were available. Given the luxury of time, I've found this strategy to provide a far deeper understanding than just getting it right the first time and moving on. 

The drawing from Allie Brosh's "Hyperbole and a Half" with the caption, "PRESS ALL THE BUTTONS!"