Nvidia has created the first generative network capable of creating a fully functional video game without an underlying game engine. The project was begun to test a theory: Could an AI learn how to imitate a game well enough to duplicate it, without access to any of the underlying game logic?
The answer is yes, at least for a classic title like Pac-Man, which also happens to be celebrating its 40th anniversary today. That’s an impressive leap forward in overall AI capability.
GameGAN uses a type of AI known as a Generative Adversarial Network. In a GAN, there are two adversarial AIs contesting with each other, each trying to beat the other.
Here’s a hypothetical: Imagine you wanted to train a neural network to determine whether an image was real or had been artificially generated. This AI starts with a base set of accurate images that it knows are real and it trains on identifying the telltale signs of a real versus a synthetic image. Once you’ve got your first AI model doing that at an acceptable level of accuracy, it’s time to build the generative adversary.
The goal of the first AI is to determine whether an image is real or fake. The goal of the second AI is to fool the first AI. The second AI creates an image and evaluates whether the first AI rejects it. In this type of model, it’s the performance of the first AI that trains the second, and both AIs are periodically backpropagated to update their ability to generate (and detect) better fakes.
The GameGAN model was trained by allowing it to ingest both video of Pac-Man plays and the associated keyboard actions used by the player at the same moment in time. One of Nvidia’s major innovations that makes GameGAN work is a decoder that learns to disentangle static and dynamic components within the model over time, with the option to swap out various static elements. This theoretically allows for features like palette or sprite swaps.
Above is a video of GameGAN in action. The team has an approach that improves the graphics quality over this level, and the jerkiness is supposedly due to limitations in capturing the video output rather than a fundamental problem with the game.
I’m not sure how much direct applicability this has for gaming. Games are great for certain kinds of AI training because they combine limited inputs and outcomes that are simple enough for an AI model to learn from but complex enough to represent a fairly sophisticated task.
What we’re talking about here, fundamentally, is an application of observational learning in which the AI has trained to generate its own game that conforms to Pac-Man’s rules without ever having an actual implementation of Pac-Man. If you think about it, that’s far closer to how humans game.
While it’s obviously possible to sit down and read the manual (which would be the rough equivalent of having underlying access to the game engine), plenty of folks learn both computer and board games by watching other people play them before jumping in to try themselves. Like GameGAN, we perform static asset substitution without a second thought. You can play checkers with classic red and black pieces or a handful of pebbles. Once you’ve watched someone else play checkers a few times, you can share the game with a friend, even if they’ve never played before.
The reason advances like GameGAN strike me as significant is because they don’t just represent an AI learning how to play a game. The AI is actually learning something about how the game is implemented purely from watching someone else play it. That’s closer, conceptually, to how humans learn — and it’s interesting to see AI algorithms, approaches, and concepts improving as the years roll by.