Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

BlueMonday1984@awful.systems · 4 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 2 March 2025

BigMuffin69@awful.systems · edit-2 1 day ago

So they had the new Claude hooked up to some tools so that it could play Pokemon red. Somewhat impressive (at least to me!) It was able to beat lt surge after several days of play. They had a stream demo’ing it on twitch and despite the on paper result of getting 3 gym badges, poor fellas got stuck in Viridian forest trying to find the exit to the maze.

As far as finding the exit goes… I guess you could say he was stumped? (MODS PLEASE DONT BAN)

strim if anyone is curious. Yes, i know this is clever advertising for anthropic, but i do find it cute and maybe someone else will?

https://www.twitch.tv/claudeplayspokemon

o7___o7@awful.systems · edit-2 18 hours ago

It looks fun!

My inner grouch wanted to add:

There were a metric shit ton of hand-crafted, artisanal, exhaustive full-text walkthroughs for the OG Pokemon games even twenty years ago. They’re all part of the training corpus, so all you have to do to make this work is automate prompt generation based on current state and then capture the most likely key words in the LLM’s outputs for conversion to game commands. Plus, a lot of “intelligence” could be hiding in the invisible “glue” that ties the whole together, up to and including an Actual Individual.

I’d be shocked if this worked for a 2025 release

BigMuffin69@awful.systems · edit-2 15 hours ago

One more tidbit, I checked in and it’s been stuck in Mt Moon first floor for 6 hours. Just out of curiosity, I asked an OAI model “what do I do if im stuck in mount moon 1F” and it spit a step-by-step guide how to navigate the cave with the location of each exit and what to look for, so yeah, even without someone hardcoding hints in the model, just knowing the game state and querying what’s next suffices to get the next step to progress the game.

BigMuffin69@awful.systems · edit-2 17 hours ago

I had a similar disc with one of my friends! Anthropic is bragging that the model was not trained to play pokemon, but pokemon red has massive wikis for speed running that based on the reasoning traces are clearly in the training data. Like the model trace said it was “training a nidoran to level 12 b.c. at level 12 nidoran learns double kick which will help against brock’s rock type pokemon”, so it’s not going totally blind in the game. There was also a couple outputs when it got stuck for several hours where it started printing things like “Based on the hint…” which seemed kind of sus. I wouldn’t be surprised if it there is some additional hand holding going on in the back based on the game state (i.e., go to oaks, get a starter, go north to viridian, etc.) that help guide the model. In fact, I’d be surprised if this wasn’t the case.