It’s clear that companies are currently unable to make chatbots like ChatGPT comply with EU law, when processing data about individuals. If a system cannot produce accurate and transparent results, it cannot be used to generate data about individuals. The technology has to follow the legal requirements, not the other way around.
This is the problem with training LLMs on Reddit. It doesn’t know how to say “I don’t know”. So, like Redditors…. It just makes shit up.
It’s not that it doesn’t know how to say “I don’t know”. It simply doesn’t know. Period. LLMs are not sentient and they don’t think about the questions they are asked, let alone if the answer they provide is correct. They string words together. That’s all. That we’ve gotten those strings of words to strongly resemble coherent text is very impressive, but it doesn’t make the program intelligent in the slightest.
What amazes me is that people don’t find it significant that they don’t ask questions. I would argue there is no such thing as intelligence without curiosity.
What do you even mean with that? Pi asks questions and certainly feels curious and engaged in conversation. Even chatgpt will ask for more information if it doesn’t find the requested information in, for example, an Excel spreadsheet you upload.
They’re trained on far more than reddit. But it’s not a training data problem, it’s a wrong tool problem. It’s called “generative AI” for a reason: it generates text, same way a Markov chain does. You want it to tell you something, it’ll tell you. You want factual data, don’t ask a storyteller.
What I think is especially funny though is that both the other person and myself have done enough (not horrific) things in our lives to have things like mainstream media mentions but it still got it entirely wrong.
I’m not famous but it definitely should have known who I am.
How can a Flying Squid not be famous? Haven’t the tonight show contacted you about doing aerobatics?
We’re far more common than you’d think.
https://phys.org/news/2013-02-bird-plane-squid.html
If an LLM had to say “I don’t know” when it doesn’t know, that’s all it would be allowed to say! They literally don’t know anything. They don’t even know what knowing means. They are complex (and impressive, admittedly) text generators.