Presentation to L'Academie Internationale de Musique Electroacoustique

Bourges, France, June 1997

Published in Organised Sound, 3(2), 141-146

COMPOSITION & DIFFUSION: SPACE IN SOUND IN SPACE

I. Introduction
Composition and diffusion can be understood as two complementary and related processes: bringing sounds together, and spreading them out again in an organized fashion. In the Western tradition, these two processes are frequently carried out by different people at different times, each drawing on specialized knowledge. The electroacoustic tradition, even if much briefer, offers the possibility of the composer designing and implementing both aspects of the music, and interrelating them in highly specific ways. Computer control offers the greatest precision in dealing with the complexities of these processes, even though, at present, separate programs are usually required.

I am mainly referring to the practice of timbral composition, which may be thought of as shaping the space within the sound, that is, its perceived volume (Truax, 1992). By this term I mean not merely the loudness of the sound, but rather its spectral and temporal shape, both of which contribute to its perceived magnitude and form. Diffusion, as the performance mode for these sounds, refers to the distribution of the (usually stereo) sound in a space through the use of a mixer and multiple loudspeakers. However, we can also understand the success of such a performance as a matching of the space within the sound with the space into which it is projected. This can be done even more effectively with multiple channel inputs where each soundtrack can be kept discrete and projected independently of all others.

At Simon Fraser University (SFU) we have been developing specific digital signal processing (DSP) techniques for each of these operations. The main techniques used for timbral composition are digital resonators, using variable length delay lines with controllable feedback, and granulation of sampled sound used for time stretching, both of which allow the composer to shape the volume of the sound (Truax, 1994). Recently both of these processes have been integrated into the same program (GSAMX). The diffusion project is a custom-designed multiple DSP box, the DM-8, designed by Harmonic Functions in collaboration with SFU, at the centre of which is a computer-controlled 8 by 8 matrix with which 8 input streams may be simultaneously routed to any of 8 output channels, either in fixed or dynamic trajectory patterns. A commercially available 16 by 16 matrix is also being developed.

II. Shaping the space inside the sound

The volume, or perceived magnitude, of a sound depends on its spectral richness, duration, and the presence of unsynchronized temporal components, such as those produced by the acoustic choral effect and reverberation. Electroacoustic techniques expand the range of methods by which the volume of a sound may be shaped. Granular time-stretching is perhaps the single most effective approach, as it contributes to all three of the variables just described. It prolongs the sound in time and overlays several unsynchronized streams of simultaneous grains derived from the source such that prominent spectral components are enhanced. In addition, my GSAMX software allows each grain stream to have its own pitch transposition, either downwards or upwards, according to a scheme where the untransposed pitch is the 4th harmonic in the scale of transpositions. That is, three downward harmonic pitches are available, plus four or more harmonics in each octave above the original pitch. However, processing the material through one or more resonators (using a waveguide or delay line) prior to granulation will also shape the spectrum of the sound quite strongly and bring out particular harmonic or formant regions.

The Karplus-Strong model of a recursive waveguide with filter has long been regarded as an efficient synthesis technique for plucked string sounds (Karplus & Strong, 1983). The basic model for the waveguide uses a delay line of p samples which determines the resonant frequency of the string, a low-pass filter which simulates the energy loss caused by the reflection of the wave, and the feedback of the sample back into the delay line. The initial energy input is simulated by initializing the delay line with random values, that is, introducing a noise burst whose spectrum decays to a sine wave at a rate proportional to the length of the delay line. The model applies equally to a string fixed at both ends or a tube open at both ends, at least in terms of the resonant frequencies all being harmonics of the fundamental. If the sample is negated before being fed back into the delay line, the resulting change of phase models a tube closed at one end, which results in only the odd harmonics being resonant, and lowers the fundamental frequency by an octave, since the negation effectively doubles the length of the delay line. For the basic model, the fundamental resonance equals SR/(p + 1/2) where SR is the sampling rate, and p is the length of the delay line.

However, since the technique models a resonating tube as well as a fixed string, it is equally suited for processing sampled sound. Because an ongoing signal activates the resonator, rather than an initial noise burst, a feedback gain factor must be used to prevent amplitude overflow and to control the amount of resonance in the resulting sound. The current real-time implementation offers a choice of delay line configurations (single, in parallel or series), plus the options of adding a comb filter (to add or subtract a delayed signal) and signal negation (which lowers the fundamental frequency by an octave and produces odd harmonics). Particularly interesting effects occur when the length of the Karplus-Strong delay and the comb filter delay are related by simple ratios. Each delay line has real-time control over its length, and hence its tuning, up to a maximum of 511 samples. The user also controls the feedback level which can be finely adjusted to ride just below saturation, in combination with the input amplitude which can be lowered to facilitate higher feedback levels. The use of sample negation also makes it easier to control high feedback levels since the length of the feedback loop is essentially doubled.

The complex behaviour of these resonators, particularly when driven to their maximum feedback level (termed hyper-resonance) cannot be tracked by the ear at normal speed, compared to when such sounds are time-stretched and their internal variations become more evident. In practice, the sound may be resonated first, using a chain of up to two or three resonators, then resampled and granulated; or else, one can introduce a single resonator directly into the processing chain during granulation, using a specific option in the GSAMX program. Such processing lengthens the decay of the resonance to an arbitrary duration, hence suggesting a very large space, while keeping the resonant frequencies intact. That is, resonant frequencies associated with relatively short tubes appear to emanate from spaces with much larger volumes, as in my work Basilica (1992). Vocal sounds subjected to this processing resemble 'overtone singing' in a reverberant cathedral, because the resonant frequencies are strong enough to be heard as pitches. The addition of simple harmonization at the granulation stage, such as an octave lower, enriches the sound further and gives the impression of a choir.

The two stage version of this processing (resonance, then time-stretching with or without harmonization) was used in my electroacoustic music theatre work Powers of Two: The Artist (1995) (Truax, 1996), which is the second act of the opera Powers of Two. The sounds employed in subsequent acts have been created using the integrated approach where the resonance is added during the time-stretching process. In one particularly striking example, found in Powers of Two: The Sibyl (1997), natural sounds such as a recording of rain and thunder, and another of ocean waves, are hyper-resonated to the point where the original sounds are engulfed by a low resonant mass of sound pitched at 60 Hz (the North American electrical frequency). Then, as the scene progresses, the amount of feedback added to the process is gradually reduced until the original sound is once again audible. This effect underlines the tension in each scene between a character associated with the modern, technological world and one associated with traditional visionary insight.

III. Shaping the sound inside a space

Although conventional diffusion is remarkably effective with a stereo source, both the two channel bottleneck, and the limitations of manual control and too little rehearsal time, are currently the weak links in the performance of electroacoustic music. Having 8 discrete sources available, all independently controllable, is not only acoustically richer for tape music (since detail is not lost through stereo mixing) but also challenging compositionally in order to integrate a spatial conception into the work. However, the same system can be used for live, or mixed live and tape performance, since nothing is assumed about the relation of the 8 input signals.

The DM-8 system is essentially an 8 by 8 matrix which routes 8 channels of input (for us, the Tascam DA-88) to 8 channels of output, presumably going through a conventional amp and speaker configuration. The hardware is a custom designed box, external to the host Macintosh, equipped with 4 Motorola 56001 chips and a 68000 controller, communicating via MIDI system exclusive messages to the graphic front end. The software for user control is a Max application, written by Chris Rolfe, which can be used either in a live performance mode with mouse triggered events, or else as a pre-programmed score synched with the MIDI timecode on the tape. Presets and an editable mixing score allow each of the input tracks to have its amplitude controlled. These mixing levels can be graphically entered, or tracked from the user's control of virtual potentiometers in real time. These recorded levels are analyzed and compressed by the program for an optimum data representation and can later be edited by the user.

A 20-page documentation of the software is available, but here are a few highlights. The 8 by 8 matrix allows manual input/output connections to be made (i.e. speakers turned on and off), preset patterns of which can be stored and implemented with variable fade times. The cross fade from one configuration, say a stereo reduction, to another, for example a multi-channel distribution over 5 - 10 seconds, is a typical operation that would be difficult to achieve manually but is aurally very attractive.

A set of 'players' extend the matrix control to either 'static' speaker lists, or to 'dynamic' trajectories. Unlike the matrix operation, they automate both the turning off of outgoing channels as well as the turning on of new channels. The dynamic assignments generate a series of cross-fades, moving an input from speaker to speaker in what we call a 'trajectory' at a specific rate with adjustable fade patterns. Pre-defined speaker patterns can be looped, cycled (forward and reverse), or randomly assigned. Since 8 such patterns can be simultaneously running, very complex movements can be easily generated. All of the player parameters transfer directly to the score method of control, hence a particular trajectory configuration can be tested in real time, then copied into the score with its precise point of implementation.

Of interest to electroacoustic composers is the ease with which a given set of speakers can be substituted for another when a new performance configuration is encountered, or when a mixdown is needed. A speaker list is defined once and labelled (e.g. left, circle, etc.) with nothing assumed about where those speakers are located. To change to a different speaker configuration, only the list needs to be edited, not each instance of its use. The label also assists the composer in dealing with particular spatial configurations independently of the often confusing lists of speaker numbers.

The nature of cross-fades between speakers is a particularly tricky subject, and the software assists the user with both graphic displays of the levels involved and real-time aural tests of the effect. Cross-fade percentage is a key variable, allowing a continuum of effects from jumping between channels to completely smooth transitions to be achieved. A 'sustain delay' parameter delays the fadeout of the previous speakers in a dynamic sequence to create a more 'polyphonic' effect (analogous to the vapour trail behind a jet). Finally, the 'fade increment' is a simple method to generate the cascaded entry of a speaker list, similar to the way one might bring in a set of speakers incrementally in conventional manual diffusion to create another polyphonic effect.

Although the system is designed for controlling 8 source channels, other uses are possible. For instance, a stereo source could be duplicated up to four times at the input of the matrix, and four pairs of distinct trajectories or speaker assigns defined. The composer could then use the mixing score or manually controlled input levels to cross-fade between the different spatial treatments. Alternatively, the entire matrix could be considered to be an effects send and return system for studio work with, for instance, two 'dry' channels and six channels of processing being mixed together.

The DM-8 has been used in performance at the 1995 International Computer Music Conference in Banff and in 1996 at various Vancouver New Music electroacoustic concerts, and is currently available for use in the Sonic Research Studio at SFU. Although an extended (16 x 16) commercial version has been developed, the existing hardware and software configuration is already extremely useful for electroacoustic diffusion. The software could also be extended by programmers wishing to add new features or more complex lower level control patterns.

IV. Recent compositional applications

As mentioned above, the 8-channel tape component of my electroacoustic opera Powers of Two was realized utilizing the techniques described here, both for the design of the component sounds and their static and dynamic distribution in 8-channels. Two other recent compositions for solo performer and stereo tape also illustrate the timbral work, namely Wings of Fire (1996) for female cellist and tape, based on a poem by B.C. poet Joy Kirstin, and Androgyne, Mon Amour (1997) for male double bassist and tape, based on poems by Tennessee Williams. In both works, the source material is derived from a reading of the poems as well as sounds recorded from the live instrument processed with granulation and/or the use of resonators simulating the open strings of the instrument. When the voice is processed in this way on tape, it is given some of the character of the instrument, and in each piece the love poetry appears to be addressed to the instrument as the lover. In other words, the spoken voice on tape appears to be resonated through the instrument being played, hence symbolizing their union as lovers.

Other sounds recorded from the cello and bass are also used to excite the resonators. These include bowing on the bridge, natural and artificial harmonics, col legno attacks, snap pizzicato and various kinds of body percussion sounds. By raising the feedback level of the resonators (tuned to the open strings), a noisy sound such as bowing on the bridge slowly changes from resembling breathing to regular bowing on the strings, once again highlighting the intimate relation between the performer and the instrument. Interestingly enough, when the length of the delay line is shortened to produce a very high pitch, the noise component once again becomes dominant, as at the end of the opening section of Wings of Fire. In Androgyne, Mon Amour the tuning of the resonators (independently controlled on each channel) changes more frequently during the reading of the text, suggesting a kind of harmonic accompaniment performed by the instrument. The live instrument, which is frequently played in a number of unconventional postures, sometimes mimics this accompaniment, or creates a counterpoint to it.

At present, the processes of shaping the 'volume' of the sound, its internal space, and distributing the sound via multiple loudspeakers into the external performance space, occur in two different design stages, much as traditional studio composition and live diffusion have been carried out. The compositional challenge is to create significant relationships between the two processes. However, if we continue to use similar DSP technology for both, it may well become feasible in future to integrate them into a single algorithm in which the individual components that create the volume of the sound are given spatial placement and definition within the performance environment. Sound and space would become inextricably linked, and composition then could truly be regarded as the acoustic design of space.

References:

K. Karplus & A. Strong, "Digital Synthesis of Plucked String and Drum Timbres," Computer Music Journal, 7(2), 1983.

B. Truax, "Musical Creativity and Complexity at the Threshold of the 21st Century," Interface, 21(1), 1992, 29-42.

B. Truax, "Discovering Inner Complexity: Time-Shifting and Transposition with a Real-time Granulation Technique," Computer Music Journal, 18(2), 1994, 38-48 (sound sheet examples in 18(1)).

B. Truax, "Sounds and Sources in Powers of Two: Towards a Contemporary Myth," Organised Sound, 1(1), 1996.

home