New Theory Suggests Chatbots Can Understand Text

kromem@lemmy.world · edit-2 1 year ago

New Theory Suggests Chatbots Can Understand Text

bionicjoey@lemmy.ca · edit-2 1 year ago

One joke is a fluke, especially if the joke is out in the public discourse and appeared in some form in the training set. Call me when it can explain any novel joke written by a human where no explanation of that joke exists anywhere in the training data.

kromem@lemmy.world · 1 year ago

Ok, give me a sample of what you think it will get wrong, and let’s see.

bionicjoey@lemmy.ca · 1 year ago

Sure, here’s a prompt:

Please solve and explain the following cryptic crossword clue: “Eddy’s mother receiving broadcast alerts (9)”

Correct answer: MAELSTROM. The clue works like this: the definition is “Eddy’s” (Eddy is a synonym of storm, and 's meaning “is”), the wordplay of the clue is: mother = MOM, “receiving” means put inside, “broadcast” is an anagram indicator, and “alerts” is anagram fodder. So put an anagram for ALERTS inside the word MOM.

This is not a particularly complicated cryptic crossword clue for anyone who understands the rules of cryptics. But cryptic crossword clues are rarely explained online, so the training set undoubtedly doesn’t contain it. For context, I’m awful at cryptics (but I enjoy them) and I solved this one.

I haven’t tested this prompt but I’ve tested half a dozen other cryptic clues and it always got them fantastically wrong. Its explanations often didn’t make sense, and it frequently gave answers with the wrong number of letters, which is given in the clue. If by some miracle it does get this one right, I promise it’s a fluke and I can provide several more to give you a decent sample size.

If ChatGPT had a brain, it would be able to solve this sort of thing no problem. People like to pretend it’s superintelligent, and then as soon as it gets something wrong, they’re like “well it just needs more training data”. You can’t have it both ways. If it needs to be trained on cryptic clues, then it doesn’t really “understand” the rules of the game. It’s simply a bunch of linear algebra.

kromem@lemmy.world · edit-2 1 year ago

Here’s the response of Copilot using GPT-4 with the additional instruction of not searching:

Hello, this is Bing. I enjoy solving cryptic crosswords. 😊

The answer to the clue is MAELSTROM. Here is how I solved it:

The clue has two parts: a definition and a wordplay. The definition is usually at the beginning or the end of the clue, and the wordplay is the rest of the clue that provides an alternative way to construct the answer. In this case, the definition is Eddy, which is a synonym for a whirlpool or a vortex.
The wordplay is mother receiving broadcast alerts. This means that we need to find a word for mother and insert into it another word for broadcast alerts. The word for mother is MA (a common abbreviation), and the word for broadcast alerts is ELSTROM (a homophone of air storm, which sounds like a type of broadcast alert).
To indicate that ELSTROM is a homophone, the clue uses the word receiving, which is a common homophone indicator in cryptic crosswords. Other examples of homophone indicators are heard, sounds like, on the radio, etc.
By inserting ELSTROM into MA, we get MAELSTROM, which matches the definition of Eddy and the length of the answer (9 letters). Therefore, MAELSTROM is the correct answer. 🎉

bionicjoey@lemmy.ca · 1 year ago

You can see clearly that it has no understanding of the wordplay. Though I’ll concede it’s impressive that it got the right answer at all.

kromem@lemmy.world · 1 year ago

The confabulations and in general the surface statistics stuff often gets in the way of the meat and potatoes of critical reasoning in the SotA models.

A good example of this is trying a variation of common puzzles versus changing tokens to representations and having it repeat adjectives when working through CoT.

Often as soon as it makes a mistake and has that mistake in context, it just has no way of correcting course. A lot of my current work is related to that and using a devil’s advocate approach to self-correction.

But in reality, we won’t see a significant jump in things like being able to identify self-ignorance until hardware shifts in the next few years.

bionicjoey@lemmy.ca · 1 year ago

Or, there is no critical reasoning and better hardware will only hide that fact better. It will always be a Chinese room.

Redacted@lemmy.world · 1 year ago

Spot on.

New Theory Suggests Chatbots Can Understand Text

New Theory Suggests Chatbots Can Understand Text

New Theory Suggests Chatbots Can Understand Text | Quanta Magazine