Skip links

Module 7: Ambisonic Sound Field

The Scene-Based Philosophy of Ambisonics

Ambisonics was developed in the 1970s in the United Kingdom as a method to capture a complete 360-degree sound field, unlike traditional channel-based systems that only project sound horizontally. Its philosophy, known as “scene-based” or “soundfield-kernel,” represents a full three-dimensional sound field using the B-format, independent of the speaker layout during playback. This separation between the capture or creation of the sound field and its final decoding allows the engineer or producer to work abstractly, without needing to know the listener’s number or arrangement of speakers.

The nature of Ambisonics offers flexibility to apply filters and spatial manipulations directly to the sound field, something that is difficult to achieve with channel-based formats. Although Ambisonics was not originally conceived as an authoring format based on objects, it can be complemented with such methods; in fact, modern standards like MPEG-H 3D Audio combine channels, objects, and Ambisonics components into a single data stream.

Fundamental Principles of Ambisonics

To understand the fundamental principles of Ambisonics, it is helpful to start with a familiar concept: mid/side (M/S) stereo recording. In M/S, an omnidirectional or cardioid microphone (the Mid channel) captures the central, monophonic information of the sound source, while a figure-eight bidirectional microphone (the Side channel) records the phase differences between sounds arriving from the left and right. The combination of both channels produces a complete stereo image.

Ambisonics extends this principle to three dimensions. The omnidirectional Mid channel corresponds to the Ambisonics W channel, which captures the total sound pressure coming from all directions. The Side channel is expanded into three bidirectional components, one for each spatial axis: X (front-back), Y (left-right), and Z (up-down). Capturing these four signals allows the complete sound field to be represented at a specific point in space.

The Ambisonic: A-format / B-format

A-format

The A-format corresponds to the signals captured directly by the microphones of an ambisonic array. Typically, four capsules are used, arranged in a tetrahedral shape, with one capsule at each vertex. Each microphone records the sound pressure according to its specific orientation.

These signals depend entirely on the physical arrangement: the number of capsules, their positions, and their polar patterns determine how the sound is captured.

At this stage, the signals do not represent an interpretable three-dimensional sound field, but rather four individual recordings from different angles.

The A-format varies depending on the manufacturer, as each ambisonic microphone design (Soundfield, Sennheiser, Core Sound, Rode, etc.) produces its own characteristics.

In general terms, the A-format constitutes the raw material of ambisonic recording: a set of unprocessed recordings that can later be transformed into a standard representation of the sound field.

B-format

The B-format is the result of mathematically transforming the A-format signals into a standard representation of the three-dimensional sound field. This transformation produces four components: W, X, Y, and Z, which abstractly encode the sound pressure at a point in space, independently of the capture arrangement or the playback system.

  • W: omnidirectional component (zero-order), which records the total sound energy coming from all directions.

  • X: first-order component along the front-back axis.

  • Y: first-order component along the left-right axis.

  • Z: first-order component along the up-down axis.

Unlike the A-format, the B-format does not depend on specific microphones or particular speakers. It allows manipulation, rotation, and decoding of the sound field for different playback systems, from multichannel setups to binaural headphones.

In practical terms, the B-format constitutes the standard, scalable representation of the sound field, ready for encoding, processing, and flexible playback in multiple environments.

Spherical Harmonics and Sound Field Representation

The B-format represents the sound field in three dimensions using a mathematical model called spherical harmonics. These are functions that describe how sound arrives from any direction around a point.

In first-order B-format, there are four channels:

  • W: omnidirectional, capturing the total sound energy from all directions (equivalent to the Mid in M/S).

  • X, Y, Z: bidirectional, figure-eight patterns that record how sound pressure changes along the three spatial axes: front-back, left-right, and up-down.

Higher orders add additional channels with more complex patterns, increasing spatial accuracy and resolution. To encode a sound, its signal is multiplied by the corresponding spherical harmonic coefficients, distributing the sound across the B-format channels in a controlled and complete manner.

This website uses cookies to improve your web experience.
English