Bot Eat Brain
Posts
VASA-1 generates realistic talking videos from images

VASA-1 generates realistic talking videos from images

PLUS: Meta. Meta everwhere.

Michael Parrish
April 23, 2024

In partnership with

TOGETHER WITH

Good morning, human brains, and welcome back to your daily munch of AI news.

Here’s what’s on the menu today:

Turn your shallow-fake into a deepfake 🦾 🤖
VASA-1 generates editable, realistic talking head videos.
57.3% more realistic than reality? 🌹 💨
PhysDreamer adds realistic motion to objects in AI-generated videos.
Meta has its own ChatGPT now 😳 💻
Meta launched a chat interface powered by Llama 3.

Sponsor Bot Eat Brain | New here? Subscribe!

MAIN COURSE

Boom. Headshot. 🥸 👀

Last week, Microsoft researchers introduced VASA-1. It generates realistic talking head videos from an image and an audio clip.

Source: Microsoft Research

What does it do?

VASA-1 generates videos with precise lip-syncing, lifelike facial expressions, natural head movements, and more.

Is it good?

It outperforms existing state-of-the-art methods in lip synchronization, head motion realism, and video quality. It allows you to control the gaze direction, head distance, and emotional expressions, and more.

What kind of vids can it make?

It produces 512 × 512 pixel videos at up to 40 frames per second. It’s highly versatile and tackles artistic photographs, singing vocals, and speech in many languages.

Why do I care?

VASA-1 could enhance learning experiences, provide support for individuals with disabilities, improve patient care through advanced diagnostics and personalized treatment plans, and more.

SIDE SALAD

AI-generated motion > Real motion 🌹 💨

What kind of dreams do hotels have? 🏨 Suite dreams… 🤭

On Friday, MIT, Stanford, Columbia, and Cornell Researchers introduced Physdreamer. It adds realistic, interactive dynamics to objects in AI-generated videos.

Source: PhysDreamer

It does what, now?

PhysDreamer is a technique that gives 3D objects realistic motion. It uses principles from physics and insights from video models to make 3D objects react more naturally to wind, touch, movement, and more.

How does it work?

It uses 3D Gaussians to represent objects, a neural field to model the physical properties, and a flexible physics engine called the Material Point Method (MPM) to simulate the motion.

Dope or nope?

In a user study, 53.7% of people thought the generated movements were more realistic than other methods, even when compared to real video footage.

Source: PhysDreamer

A LITTLE SOMETHING EXTRA

Meta. Meta everywhere… 😳 💻

The internet had a… Meta-morphosis 🥴

On Thursday, Meta launched meta.ai. It’s a website with a chat interface that’s powered by Meta’s AI model, Llama 3.

Source: Meta.ai

It looks… Familiar… 🧐

What can it do?

Meta claims Llama 3 is the most powerful free AI assistant available on the web. It can help you plan trips, generate images from text prompts, convert images into GIFs, and more.

When can I use it?

Meta.ai is available to use now in the US, Australia, Canada, Africa, and more.

YOUR DAILY MUNCH

Otio

Revolutionize Your Workflow with Otio

Enhance your research efficiency with Otio. Automatically summarize content from papers, videos, tweets, and more. Engage directly with insights through our interactive AI. Write, edit, and paraphrase with precision using our AI-powered text editor.

Try Otio for free today—No credit card required!

Tools

Direqt — a chatbot trained on your data that integrates onto your site.

CoachVox — clone yourself and charge people to interact with you.

MailSplash — an email marketing assistant that designs, writes, and more.

PhoneScreen — a recruitment tool that ranks your candidates with AI.

Think Pieces

How much AI is too much AI? A look at 11 recently introduced models and how they’re miniscule in the grand scheme of AI.

Open-source AI isn’t really open-source. Many of the largest open-source AI models don’t have their training data openly available.

The age of internet users is decreasing. How U.K. regulators look into combatting malicious content involving children.

Startup News

OpenAI unveiled enhancements to its Assistants API. The improvements include efficient data handling, improved file search, vector storage, and more.

Meta launched chatbots powered by Llama 3. You’ll now see “Meta AI” on Facebook and Instagram.

Hugging Face released Open-Medical-LLM. It’s a benchmark that tests how generative AI models perform on various health-related tasks.

Research

HQ-Edit — generates pairs of before and after images that show detailed changes based on written instructions to enhance image editing software.

SIMA — a large dataset of gameplay from various video games that trains AI agents to follow instructions, comprehend real-world environments, and more.

TRIFORCE — a method that speeds up LLMs’ generation of long text sequences and minimizes memory usage and computational delays.