works@solrezza.net

Module 7: Ambisonic Sound Field The Scene-Based Philosophy of Ambisonics Ambisonics was developed in the 1970s in the United Kingdom as a method to capture a complete 360-degree sound field, unlike traditional channel-based systems that only project sound horizontally. Its philosophy, known as “scene-based” or “soundfield-kernel,” represents a full three-dimensional sound field using the B-format, independent of the speaker layout during playback. This separation between the capture or creation of the sound field and its final decoding allows the engineer or producer to work abstractly, without needing to know the listener’s number or arrangement of speakers.The nature of Ambisonics offers flexibility to apply filters and spatial manipulations directly to the sound field, something that is difficult to achieve with channel-based formats. Although Ambisonics was not originally conceived as an authoring format based on objects, it can be complemented with such methods; in fact, modern standards like MPEG-H 3D Audio combine channels, objects, and Ambisonics components into a single data stream. Fundamental Principles of Ambisonics To understand the fundamental principles of Ambisonics, it is helpful to start with a familiar concept: mid/side (M/S) stereo recording. In M/S, an omnidirectional or cardioid microphone (the Mid channel) captures the central, monophonic information of the sound source, while a figure-eight bidirectional microphone (the Side channel) records the phase differences between sounds arriving from the left and right. The combination of both channels produces a complete stereo image. Ambisonics extends this principle to three dimensions. The omnidirectional Mid channel corresponds to the Ambisonics W channel, which captures the total sound pressure coming from all directions. The Side channel is expanded into three bidirectional components, one for each spatial axis: X (front-back), Y (left-right), and Z (up-down). Capturing these four signals allows the complete sound field to be represented at a specific point in space. The Ambisonic: A-format / B-format A-format The A-format corresponds to the signals captured directly by the microphones of an ambisonic array. Typically, four capsules are used, arranged in a tetrahedral shape, with one capsule at each vertex. Each microphone records the sound pressure according to its specific orientation.These signals depend entirely on the physical arrangement: the number of capsules, their positions, and their polar patterns determine how the sound is captured.At this stage, the signals do not represent an interpretable three-dimensional sound field, but rather four individual recordings from different angles.The A-format varies depending on the manufacturer, as each ambisonic microphone design (Soundfield, Sennheiser, Core Sound, Rode, etc.) produces its own characteristics.In general terms, the A-format constitutes the raw material of ambisonic recording: a set of unprocessed recordings that can later be transformed into a standard representation of the sound field. B-format The B-format is the result of mathematically transforming the A-format signals into a standard representation of the three-dimensional sound field. This transformation produces four components: W, X, Y, and Z, which abstractly encode the sound pressure at a point in space, independently of the capture arrangement or the playback system.W: The W channel is the zero-order omnidirectional component of the B-format. This means it captures all the sound energy coming from any direction, as if it were a “microphone listening everywhere at once.” To keep its energy consistent with the other channels, it is normally attenuated by -3 dBX: first-order component along the front-back axis.Y: first-order component along the left-right axis.Z: first-order component along the up-down axis.Unlike the A-format, the B-format does not depend on specific microphones or particular speakers. It allows manipulation, rotation, and decoding of the sound field for different playback systems, from multichannel setups to binaural headphones.In practical terms, the B-format constitutes the standard, scalable representation of the sound field, ready for encoding, processing, and flexible playback in multiple environments. FuMa (Furse-Malham) FuMa is the original channel format of Ambisonics. For first-order Ambisonics, it uses the WXYZ nomenclature: W is omnidirectional, X front/back, Y left/right, and Z up/down. This scheme was based on a direct correspondence with microphone polar patterns and the axes of a coordinate system.With the introduction of higher orders, the FuMa format became less practical. The arrangement of spherical harmonics in higher orders is more symmetrical, but FuMa placed the horizontal components at the end of the channel list, making interpretation difficult. Although it was extended up to third order, its widespread adoption was limited due to complexity and a lack of consensus on normalizations, resulting in multiple incompatible variants. AmbiX (Ambisonic Channel Number, ACN) AmbiX was developed to address the interoperability issues of FuMa, especially in higher orders. Proposed in 2009 by Michael Chapman during the first Ambisonics Symposium, it uses the Ambisonic Channel Number (ACN) convention, based on a predictable and scalable channel order for any Ambisonic order.The key principle of AmbiX is that the Ambisonic order can be directly deduced from the number of channels, following the formula (n + 1)^2. This consistency eliminated historical confusion and allowed AmbiX to become the de facto standard for Ambisonic audio exchange. Spherical Harmonics and Sound Field Representation The B-format represents the sound field in three dimensions using a mathematical model called spherical harmonics. These are functions that describe how sound arrives from any direction around a point.In first-order B-format, there are four channels:W: omnidirectional, capturing the total sound energy from all directions (equivalent to the Mid in M/S).X, Y, Z: bidirectional, figure-eight patterns that record how sound pressure changes along the three spatial axes: front-back, left-right, and up-down.Higher orders add additional channels with more complex patterns, increasing spatial accuracy and resolution. To encode a sound, its signal is multiplied by the corresponding spherical harmonic coefficients, distributing the sound across the B-format channels in a controlled and complete manner. Ambisonic Orders and Spatial Resolution The concept of "order" in Ambisonics directly indicates the spatial resolution and accuracy in representing the sound field. As the order increases, the number of channels grows exponentially, allowing a more detailed description of 3D sound. For a three-dimensional sound field, the total number of channels is calculated using the formula: (n+1)^2 where n is the Ambisonic order. Order (n)Channels ExampleTypical Use Case0WMono/Omnidirectional1° (FOA - First-Order Ambisonics)4W, X, Y, ZGood for ambiances and VR/AR2° (SOA - Second-Order Ambisonics)9FOA + 5 new channelsHigher spatial resolution3° (HOA - Higher-Order Ambisonics)162nd Order + 7 new channelsBetter localization and sweet spot Higher-Order Ambisonics (HOA) requires more channels and speakers, but it improves spatial resolution, localization accuracy, and expands the sweet spot—the area where the sound field is reproduced faithfully. Outside this area, accuracy decreases. However, the system does not fail: at low frequencies, it physically reconstructs the sound field, and at high frequencies, it prioritizes the essential directional cues for auditory localization. This hybrid approach allows the perception of position to be maintained even beyond the physical limits of the sweet spot, with precise sound energy up to ~4 kHz and psychoacoustic cues for higher frequencies. Encoding, Decoding & Rendering Encoding Encoding is the process of transforming one or more sound sources into an Ambisonic format (B-format or other).It starts from a mono or multichannel signal.Each source is distributed across the B-format channels using basis functions (spherical harmonics).For example, in first-order Ambisonics: W (omnidirectional), X (front/back axis), Y (left/right axis), and Z (up/down axis).The result is a sound field independent of the speaker layout, containing all directional and amplitude information.Purpose: to represent the sound field in 3D so that it can be manipulated and reproduced on any speaker system or headphones. Decoding Decoding converts the encoded signal (B-format) into signals specific to a particular speaker or headphone setup.It works as a gain matrix that distributes the B-format channels to each speaker according to its position.Common strategies include Projection (SAD), Pseudo-inverse (MMAD), Regularized Pseudo-inverse (RMMAD), Energy-Preserving (EPAD), and AllRAD. Each strategy balances accuracy, stability, and compatibility with irregular layouts.Fundamental rule: the number of speakers must be at least equal to the number of Ambisonic channels to maintain fidelity.Purpose: to reproduce the encoded sound field as accurately as possible on a given physical system. Rendering Rendering is the final step, where the decoded audio is delivered to the listener through speakers or headphones.With speakers: the sound field is physically reconstructed.With headphones: binaural rendering is required, using HRTFs that simulate how the head, torso, and ears affect the sound.It considers psychoacoustic cues: ITD (interaural time difference) for low frequencies, and ILD (interaural level difference) for high frequencies.Advanced techniques, such as Ambisonic ILD Optimization, correct low-order limitations to improve spatialization and localization.Purpose: to create the final immersive experience, ensuring that the direction, elevation, and depth of the sound are perceived accurately.