80 Years On, It's About Time

Getting it Right for Speech Reinforcement

April 27 marked the 80th anniversary of a historic milestone in the history of audio. On this date in 1933, the Philadelphia Orchestra under deputy conductor Alexander Smallens was picked up by three microphones at the Academy of Music in Philadelphia—left, center, and right of the orchestra stage—and the audio transmitted over wire lines to Constitution Hall in Washington, where it was replayed over three loudspeakers placed in similar positions to an audience of invited guests. Music director Leopold Stokowski manipulated the audio controls at the receiving end in Washington.

This historic event was reported and analyzed by audio pioneers Harvey Fletcher, J.C. Steinberg and W.B. Snow, E.C. Wente and A.L. Thuras, and others, in a collection of six papers published in January 1934 as the Symposium on Auditory Perspective by the IEEE, in Electrical Engineering. Paul Klipsch referred to the Symposium as "one of the most important papers in the field of audio."

 

Leopold Stowkowski and Harvey Fletcher

April 27, 1933: Leopold Stokowski at the controls with Harvey Fletcher observing
 

Prior to 1933, Fletcher had been working on what has since been termed the wall of sound. “Theoretically, there should be an infinite number of such ideal sets of microphones and sound projectors [i.e., loudspeakers] and each one should be infinitesimally small,” he wrote.

Fletcher's curtains of microphones and loudspeakers

Fletcher’s dual curtains of microphones and loudspeakers
 

Fletcher continued, “Practically, however, when the audience is at a considerable distance from the orchestra, as usually is the case, only a few of these sets are needed to give good auditory perspective; that is, to give depth and a sense of extensiveness to the source of the music.”

In this regard, Floyd Toole’s conclusions—following a career spent researching loudspeakers and listening rooms—are especially noteworthy. In his 2008 magnum opus, Sound Reproduction: Loudspeakers and Rooms, Toole noted that the “feeling of space”—apparent source width plus listener envelopment—which turns up in the research as the largest single factor in listener perceptions of “naturalness” and “pleasantness,” two general measures of quality, is increased by the use of surround loudspeakers in typical listening rooms and home theatres.

Given that these smaller spaces cannot be compared in either size or purpose to concert halls where sound is originally produced, Toole noted that in the 1933 experiment, “there was no need to capture ambient sounds, as the playback hall had its own reverberation."

 

Localization Errors

Recognizing that systems of as few as two and three channels were “far less ideal arrangements,” Steinberg and Snow observed that, nevertheless, “the 3-channel system was found to have an important advantage over the 2-channel system in that the shift of the virtual position for side observing positions was smaller."

In other words, for listeners away from the sweet spot along the hall’s center axis, localization errors due to shifts in the phantom images between loudspeakers were smaller in the case of a Left-Center-Right system compared with a Left-Right system.

Significantly, Fletcher did not include localization along with “depth and a sense of extensiveness” among the characteristics of "good auditory perspective.”

Regarding localization, Steinberg and Snow realized that “point-for-point correlation between pick-up stage and virtual stage positions is not obtained for 2-and 3-channel systems.” Further, they concluded that the listener “is not particularly critical of the exact apparent positions of the sounds so long as he receives a spatial impression. Consequently 2-channel reproduction of orchestral music gives good satisfaction, and the difference between it and 3-channel reproduction for music probably is less than for speech reproduction or the reproduction of sounds from moving sources.”

The 1933 experiment was intended to investigate “new possibilities for the reproduction and transmission of music,” in Fletcher’s words. Many, if not most, of the developments in multichannel sound have been motivated and financed by the film industry in the wake of Hollywood's massive financial investment in the "talkies" that single-handedly sounded the death knell of Vaudeville, and led to the conversion of a great many theatres into cinemas.

Given that the growth of the audio industry stemmed from research and development into the reproduction and transmission of sound for the burgeoning telephone, film, radio, television, and recorded music industries, it is curious that the term “theatre” continued (and still continues to this day) to be applied to the buildings and facilities of both cinemas and theatres. This reflects the confusion not only in their architecture, on which the noted theatre consultant Richard Pilbrow commented in his wonderful 2011 memoir A Theatre Project, but also in the development of their respective audio systems.

 

Theatre is Not Cinema: The Differing Requirements of Speech Reinforcement

Sound reinforcement was an early offshoot, eagerly adopted by demagogues and traveling salesmen alike to bend crowds to their way of thinking; yet, as Don Davis noted in 2013 in Sound System Engineering, “Even today, the most difficult systems to design, build, and operate are those used in the reinforcement of live speech. Systems that are notoriously poor at speech reinforcement often pass reinforcing music with flying colors. Mega churches find that the music reproduction and reinforcement systems are often best separated into two systems.”

The difference lies partly in the relatively low channel count of audio reproduction systems that makes localization of talkers next to impossible. Since delayed loudspeakers were widely introduced into the live sound industry in the 1970’s, they have been used almost exclusively to reinforce the main house sound system, not the performers themselves. This undoubtedly arose from the sheer magnitude of the sound pressure levels involved in the stadium rock concerts and outdoor festivals of the era.

However, in the case of, say, an opera singer, the depth, sense of extensiveness, and spatial impression that lent appeal to the reproduced sound of the symphony orchestra back in 1933, likely won’t prove satisfying in the absence of the ability to localize the sound image of the singer’s voice accurately. Perhaps this is one reason why “amplification” has become such a dirty word among opera aficionados.

In the 1980s, however, the English theatre sound designer Rick Clarke and others began to explore techniques of making sound appear to emanate from the lips of performers rather than from loudspeaker boxes. They were among a handful of pioneers who used the psychoacoustics of delay and the Haas effect “to pull the sound image into the heart of the action,” as sound designer David Collison recounted in his 2008 volume, The Sound of Theatre.

Out Board Electronics in the UK has since taken up the cause of speech sound reinforcement, with a unique delay-based input-output matrix in its TiMax2 Soundhub that enables each performer’s radio mic to be fed to dozens of loudspeakers—if necessary—arrayed throughout the house, with unique levels and delays to each loudspeaker such that more than 90 per cent of the audience is able to localize the voice back to the performer via Haas effect-based perceptual precedence, no matter where they are seated. Out Board refers to this approach as source-oriented reinforcement (SOR).

The delay matrix approach to SOR originated in the former DDR (East Germany), where in the 1970s, Gerhard Steinke, Peter Fels and Wolfgang Ahnert introduced the concept of Delta-Stereophony in an attempt to increase loudness in large auditoriums without compromising directional cues emanating from the stage. In the 1980s, Delta-Stereophony was licensed to AKG and embodied in the DSP 610 processor. While it offered only six inputs and 10 outputs, it came at the price of a small house.

Out Board started working on the concept in the early 1990s and released TiMax (now known as TiMax Classic) around the middle of the decade, progressively developing and enlarging the system up to the 64 x 64 input-output matrix, with 4,096 cross points, that characterizes the current generation, TiMax2.

The TiMax Tracker, an ingenious radar-based location system, locates performers to within six inches in any direction, so that the system can interpolate softly between pre-established location image definitions in the Soundhub for up to 24 performers simultaneously. The audience is thereby enabled to localize performers’ voices accurately as they move around the stage, or up and down on risers, thus addressing the deficiency of conventional systems regarding the localization of both speech and moving sound sources.

 

Source-Oriented Reinforcement

Out Board director Dave Haydon put it this way: “First thing to know about source-oriented reinforcement is that it’s not panning. Audio localization created using SOR makes the amplified sound actually appear to come from where the performers are on stage. With panning, the sound usually appears to come from the speakers, but biased to relate roughly to a performer’s position on stage. Most of us are also aware that level panning only really works for people sitting near the center line of the audience. In general, anybody sitting much off this center line will mostly perceive the sound to come from whichever stereo speaker channel they’re nearest to.

“This happens because our ear-brain combo localizes to the sound we hear first, not necessarily the loudest. We are all programmed to do this as part of our primitive survival mechanisms, and we all do it within similar parameters. We will localize even to a 1 ms early arrival, all the way up to about 25 ms, then our brain stops integrating the two arrivals and separates them out into an echo. Between 1 ms and about 10 ms arrival time differences, there will be varying coloration caused by phasing artifacts.

“This localization effect, called precedence or Haas Effect after the scientist who discovered it, works within a 6-8 dB level window. This means the first arrival can be up to 6-8 dB quieter than the second arrival and we’ll still localize to it. This is handy as it means we can actively apply this localization effect and at the same time achieve useful amplification.

“If we don’t control these different arrivals they will control us. All the various natural delay offsets between the loudspeakers, performers and the different seat positions cause widely different panoramic perceptions across the audience. You only to have to move 13 inches to create a differential delay of 1 ms, causing significant image shift. Pan pots just controlling level can't fix this for more than a few audience members near the center. You need to manage delays, and ideally control them differentially between every mic and every speaker, which requires a delay-matrix and a little cunning, coupled with a fairly simple understanding of the relevant physics and biology,” Haydon said.

 

Into the Mainstream

More and more theatres are adopting this approach, including New York’s City Center and the UK’s Royal Shakespeare Company. A number of Raymond Gubbay productions of opera-in-the-round at the notoriously difficult Royal Albert Hall—including Aida, Tosca, The King and I, La Bohème and Madam Butterfly—as well as Carmen at the O2 Arena, have benefited from source oriented reinforcement, as have recent productions of Les Miserables, Jesus Christ Superstar, Into the Woods, Beggar’s Opera, Marie Antoinette, Andromache, Tanz de Vampire, Lord of the Flies, Fela!, and many others at venues around the world.

Veteran West End sound designer Gareth Fry employed the technique earlier this year at the Barbican Theatre for The Master and Margarita, to make it possible for all audience members to continuously localize to the actors’ voices as they moved around the Barbican’s very wide stage. He noted that, in the three-hour show with a number of parallel story threads, this helped greatly with intelligibility to ensure the audience’s total immersion in the show’s complex plot lines.

Based on the experience, Fry said, “I’m quite sure that in the coming years, SOR will be the most common way to do vocal reinforcement in drama.”

As we mark the 80th anniversary of that historic first live stereo transmission, it’s worth noting that, in spite of the proliferation of surround formats for sound reproduction that has to date culminated in the cinematic marvel of 64-channel Dolby Atmos, we are only now getting onto the right track with regard to speech reinforcement.

It’s about time.


(photo source: http://www.stokowski.org)