“* People ask LLMs to write code
LLMs recommend imports that don’t actually exist
Attackers work out what these imports’ names are, and create & upload them with malicious payloads
People using LLM-written code then auto-add malware themselves”
What so many people don’t understand: LLMs like ChatGPT are nothing but statistical engines. They break their incoming text into tokens, and see which tokens usually follow which others. When they generate output, they just roll the dice: After tokens A, B, and C, usually comes a D.
The point is: they have no understanding. If their training data included a good code example, they might regurgitate it. If their training data included broken code, they may regurgitate that. Or they could mix it all together and produce something weird. It’s a lottery, based on what they sucked out of StackOverflow and other places.