• Bot Eat Brain
  • Posts
  • Claude Plays Pokémon — The best way to measure AI?

Claude Plays Pokémon — The best way to measure AI?

PLUS: Octopi or Octop-AI?

Hello again, human brain, and welcome back to your daily munch of AI news.

Here’s what’s on the menu today:

  • AGI timeline crumbles to Pikachu🧠

    AI models can't beat Pokémon.

  • Your benchmarks are a lie🤡

    AI hasn’t improved since mid-2024.

    New here? Subscribe! 😎

Want to sponsor Bot Eat Brain?

🌎 Reach: 23,000+ readers

📩 Open Rate: 40%+

📍 Location: 80% USA, Canada, & UK

Peep today's Spot the AI at the bottom. 👇

MAIN COURSE

Pokémon killed your AI future🎮

Anthropic's most advanced AI model, Claude 3.7 Sonnet, still can't beat Pokémon despite being touted as a step toward artificial general intelligence.

What? Claude is evolving! 🐥

Why do I care?

AI firms like Anthropic predict AGI (artificial general intelligence) by 2027. Claude's performance raises questions about these timelines.

How does Claude play?

It wasn't specifically trained for Pokémon. It views game screenshots and takes long pauses to “think” between moves.

The good?

Claude excels at text-based parts of the game, developing battle strategies and building knowledge about Pokémon types. It follows misleading instructions that confuse human children.

The bad?

It frequently gets stuck walking into walls, revisiting completed areas, and talking to the same NPCs repeatedly. It struggles with basic 2D navigation.

This is Onyx-eptible. 🗿

Why does it suck?

Claude has limited “memory” and blindly trusts its own notes, even when wrong. Once it becomes convinced of incorrect information (like a forest exit location), it spends hours stubbornly trying the same failed approach.

What's Anthropic saying?

Project developer David Hershey believes that improved screen understanding could help Claude beat the game.

EXTRA FRIES

AI-powered cash💰

Amazon’s not broken. But your mix might be.

If your Amazon growth is slowing while spend climbs, it’s not the platform—it’s the strategy.

Brands like HyperIce, Caraway, and Goli are shifting to affiliate marketing that drives external traffic and only costs money when it converts.

See what kind of revenue lift this could mean for you.

SIDE SALAD

Your AI got dumber🤡

Dean Valentine, a founder that builds AI security tools, published an article that claims recent model improvements are mostly hype and don’t translate to real-world performance.

What's the claim?

Despite impressive benchmark scores from new models, they’ve only minimally improved on real pentesting tasks since Claude 3.5 Sonnet launched in mid-2024.

Who's saying this?

He’s tested every major model release on vulnerability identification tasks. He finds that improvements come from better engineering, not better AI.

Is it just this dude?

Allegedly not:

Why the disconnect?

There are 3 possibilities:

  1. AI labs might be “cheating” on benchmarks

  2. Standard benchmarks might not measure real usefulness

  3. Models remain bottlenecked by alignment issues

What benchmarks matter?

The author suggests ignoring standardized test-like benchmarks in favor of complex tasks requiring long-term memory and planning, like Claude Plays Pokémon, plus direct experience using AI for real work.

YOUR DAILY MUNCH

Cool Tool 🛠️

Start learning AI in 2025

Keeping up with AI is hard – we get it!

That’s why over 1M professionals read Superhuman AI to stay ahead.

  • Get daily AI news, tools, and tutorials

  • Learn new AI skills you can use at work in 3 mins a day

  • Become 10X more productive

Startup News 💰

OpenAI delayed GPT-5’s launch. Altman says it’s due to integration complexities and unexpected performance improvements.

DeepSeek unveiled a new dual-method AI reasoning technique. It boosts model performance by combining GRM and critique tuning.

Research 👨‍🔬 

KnowSelf — allows language agents to use self-awareness to determine when to rely on internal reasoning or seek external knowledge, inspired by human decision-making.

JavisDiT — a Joint Audio-Video Diffusion Transformer designed for high-quality, synchronized audio-video generation.

MEME FOR DESSERT

SPOT THE AI

3 of these are real octopuses. 1 is fake. 🐙

Which one is AI-generated? 👇

Octopi vs Octop-AI? 🤔

Which is AI-generated?

Login or Subscribe to participate in polls.

Ideas? Comments? Complaints?

Respond to this email or hit me up on 𝕏.

Until next time 🤖😋🧠

What'd you think of today's newsletter?

Login or Subscribe to participate in polls.