Google DeepMind Unveils 'Genie 2' for Interactive 3D Environments

Bewerkt door: Veronika Nazarova

Google DeepMind has introduced 'Genie 2', an advanced AI capable of generating interactive 3D environments from single images, aimed at training AI agents.

This 'Foundation World Model' can create complex 3D worlds where both humans and AI agents can interact using keyboard and mouse. Demonstration videos showcase its ability to model physical effects like gravity, smoke, and water reflections while maintaining environmental consistency and simulating non-player character (NPC) behavior.

Technically, Genie 2 is an autoregressive latent diffusion model trained on a large video dataset. It can sustain generated worlds for up to a minute, with most examples lasting 10 to 20 seconds.

Compared to its predecessor, the original 'Genie' was limited to 2D platform games and operated slowly at one frame per second. In contrast, an unoptimized version of Genie 2 runs in real-time, albeit at reduced quality.

A key goal of Genie 2 is to train AI agents, demonstrated by the SIMA agent (Scalable Instructable Multiworld Agent) executing instructions in the generated environments. The research team aims to address structural challenges in training embodied agents, striving for the breadth necessary for advancements toward Artificial General Intelligence (AGI).

However, challenges remain, including variable output quality and the need for improved consistency in longer interactions.

Heb je een fout of onnauwkeurigheid gevonden?

We zullen je opmerkingen zo snel mogelijk in overweging nemen.