This 2025 began with a shift in the tech industry: the arrival of Muse, in partnership with Microsoft, and the debut of ACE with Nvidia and Xbox; two generative AIs capable of building interactive virtual environments in video games. With this announcement, artificial intelligence moved beyond simple image and video generation—which had already shown signs of creative exhaustion—to fully dive into the creation of complete worlds, with their own rules and landscapes in constant transformation. And here comes the inevitable question:
How do these new generative virtual worlds sound?
From the tech industry, we are being trained not only to inhabit virtual environments that are increasingly similar to reality, but also to understand how they are created.
These new technologies build virtual landscapes by shaping our relationship with the immediate environment, and in that process, they force us to rethink the role that immersive experience designers are starting to play.
The Invisible Architecture
Sound has always been an invisible architecture, a language that speaks directly to the body. The reverberation of an underground tunnel tells us more about its depth than any 4K render.
The crunch of gravel underfoot, the density of fog on an empty street, even the emptiness of artificial silence—these are all elements traditionally designed by human hands, with human decisions. But what happens when these landscapes begin to generate themselves? When sound stops being composed and starts being predicted?
A reverberation tells us the depth of a tunnel before we see it; the directionality of a sound makes us turn our heads in a dark room.
If sound has always connected us to space intuitively, technology now redefines that relationship.
A few years ago, I recorded a radio program inside the South Water Tower in Germany (Wasserturm Süd in Halle, Saale).
The tower was an immense cylinder, empty, a shell of concrete and brick. That experience shaped my understanding of reverberations. In a large circular space, a voice, a bandoneon, and a clarinet transform an empty, semi-abandoned space into a place full of life.
I wonder what will happen when the acoustics of a space stop being a physical consequence and become an algorithmic decision.
When an AI determines the exact duration of a reverberation or the sonic depth of a tunnel before someone walks through it, will we still perceive the space in the same way? Or will we adapt to a sound architecture that no longer responds to our presence, but rather to a calculated logic about how a space sounds?
The Future Sounds Like Data
But the video game industry not only views this advancement with enthusiasm but also with concern. WHAM (World and Human Action Model), the technology powering Muse, is already demonstrating that prediction and real-time generation of environments are the new frontier.
From just one second of human gameplay—equivalent to ten frames—it is capable of predicting the evolution of a video game session.
If Muse creates worlds and WHAM anticipates them, it’s only a matter of time before similar systems also handle the organization of sound in space.
AI will not only be able to generate sounds in real time, but also decide where to place them, how they behave in relation to the environment, and how they interact with the user.
The technological evolution of audio spatialization, from stereophony to object-based audio, has transformed the way we experience sound in a space; but AI could take it one step further: a sound space that constantly adapts to the user’s position, speed, visual environment, and even their emotional state.
We will no longer talk about sound designers manually programming each effect, but about systems trained to generate complete sound environments.
But it’s not just about creating sounds; it’s about defining the rules by which they will be distributed in the virtual space.
If AI learns to generate sound environments, the question is: what data will it use to train itself? If it is only fed conventional soundscapes, it could end up replicating a static world, lacking the diversity and organic nature of real sound. AI does not improvise or dream; it only replicates what we give it.
If we train a system with recordings of urban spaces that reflect only hegemonic cultures, they could end up being an artificial echo of the great powers and a poor representation of more marginalized regions.
If the database comes exclusively from cinematic recordings or sound libraries already available on the internet, we will lose the richness of spontaneous and natural sound.
The way we curate these datasets will determine the type of landscapes that will be built in the future.
But who will be the next curators of these data banks? How do we prepare for these new models in which art is conditioned by AI? Will users truly notice the difference?
The video game industry has already split into two clear stances. On one side, there are those who see the new generative AIs as a technological tool that streamlines and perfects world creation, like Brendan Greene, the person responsible for popularizing the battle royale genre, from which games like Fortnite emerged.
On the other side, there are those who believe AI will replace workers in the industry with models lacking creativity and that players will notice this change. In this group are the creators of the No GEN AI label, a collective of independent developers who have designed a logo for studios to display on digital store pages, indicating that no generative AI was used in the creation of the video game.
The Great Challenge: Between Creation and Curation
will we continue designing sounds, or will we start curating sound data for AI?
From my point of view, the sound designer of the future will no longer compose an atmosphere from scratch; instead, they will select, refine, and model datasets of millions of sounds, thus training a system that will create sounds in real time.
Like a digital archaeologist, they won’t create as much as they will direct the machine’s possibilities. Their task will be to choose the sounds, define their characteristics, and establish the parameters with which AI will operate within an immersive environment.
If the role of the sound designer increasingly leans toward data management and model training, our way of perceiving space will change completely.
It’s not just about adapting but deciding whether we want the sound of the future to be a mere replica of what we already know or an opportunity to reimagine our relationship with listening, both in virtual and physical spaces.
I am not against AI, but I don’t see it as an absolute solution either. I believe it is essential to understand its implications, explore its possibilities, and question its limits.
The answers to these questions might give us the chance to find possible gaps in a system that will determine how we conceive immediate reality.