Bot Eat Brain
Posts
Claude Plays Pokémon — The best way to measure AI?

Claude Plays Pokémon — The best way to measure AI?

PLUS: Octopi or Octop-AI?

Michael Parrish
April 09, 2025

Hello again, human brain, and welcome back to your daily munch of AI news.

Here’s what’s on the menu today:

AGI timeline crumbles to Pikachu🧠
AI models can't beat Pokémon.
Your benchmarks are a lie🤡
AI hasn’t improved since mid-2024.
New here? Subscribe! 😎

Want to sponsor Bot Eat Brain?

🌎 Reach: 23,000+ readers

📩 Open Rate: 40%+

📍 Location: 80% USA, Canada, & UK

Peep today's Spot the AI at the bottom. 👇

MAIN COURSE

Pokémon killed your AI future🎮

Anthropic's most advanced AI model, Claude 3.7 Sonnet, still can't beat Pokémon despite being touted as a step toward artificial general intelligence.

What? Claude is evolving! 🐥

Source: ClaudePlaysPokemon

Why do I care?

AI firms like Anthropic predict AGI (artificial general intelligence) by 2027. Claude's performance raises questions about these timelines.

How does Claude play?

It wasn't specifically trained for Pokémon. It views game screenshots and takes long pauses to “think” between moves.

The good?

Claude excels at text-based parts of the game, developing battle strategies and building knowledge about Pokémon types. It follows misleading instructions that confuse human children.

The bad?

It frequently gets stuck walking into walls, revisiting completed areas, and talking to the same NPCs repeatedly. It struggles with basic 2D navigation.

This is Onyx-eptible. 🗿

Why does it suck?

Claude has limited “memory” and blindly trusts its own notes, even when wrong. Once it becomes convinced of incorrect information (like a forest exit location), it spends hours stubbornly trying the same failed approach.

Source: Anthropic

What's Anthropic saying?

Project developer David Hershey believes that improved screen understanding could help Claude beat the game.

Watch Claude LIVE in action.

EXTRA FRIES

AI-powered cash💰

Smarter Growth for DTC Brands on Amazon

Ad spend keeps climbing. ROAS? Not so much.

The smartest Amazon sellers aren’t spending more—they’re spending smarter.

The Affiliate Shift Calculator models what could happen if you reallocated a portion of your ad budget into affiliate marketing.

Built for sellers doing $5M+ on Amazon.

Get My Custom Forecast

SIDE SALAD

Your AI got dumber🤡

Dean Valentine, a founder that builds AI security tools, published an article that claims recent model improvements are mostly hype and don’t translate to real-world performance.

Source: Zeropath

What's the claim?

Despite impressive benchmark scores from new models, they’ve only minimally improved on real pentesting tasks since Claude 3.5 Sonnet launched in mid-2024.

Who's saying this?

He’s tested every major model release on vulnerability identification tasks. He finds that improvements come from better engineering, not better AI.

Is it just this dude?

Allegedly not:

Source: Zeropath

Why the disconnect?

There are 3 possibilities:

AI labs might be “cheating” on benchmarks
Standard benchmarks might not measure real usefulness
Models remain bottlenecked by alignment issues

What benchmarks matter?

The author suggests ignoring standardized test-like benchmarks in favor of complex tasks requiring long-term memory and planning, like Claude Plays Poké mon, plus direct experience using AI for real work.

YOUR DAILY MUNCH

Cool Tool 🛠️

Start learning AI in 2025

Keeping up with AI is hard – we get it!

That’s why over 1M professionals read Superhuman AI to stay ahead.

Get daily AI news, tools, and tutorials
Learn new AI skills you can use at work in 3 mins a day
Become 10X more productive

Startup News 💰

OpenAI delayed GPT-5’s launch. Altman says it’s due to integration complexities and unexpected performance improvements.

DeepSeek unveiled a new dual-method AI reasoning technique. It boosts model performance by combining GRM and critique tuning.

Research 👨‍🔬

KnowSelf — allows language agents to use self-awareness to determine when to rely on internal reasoning or seek external knowledge, inspired by human decision-making.

JavisDiT — a Joint Audio-Video Diffusion Transformer designed for high-quality, synchronized audio-video generation.

MEME FOR DESSERT

SPOT THE AI

3 of these are real octopuses. 1 is fake. 🐙

Which one is AI-generated? 👇

Octopi vs Octop-AI? 🤔

Which is AI-generated?

Ideas? Comments? Complaints?

Respond to this email or hit me up on 𝕏.

Until next time 🤖😋🧠