Music and Science Meet at the Micro Level: Time-Frequency Methods and Granular Synthesis
Presented at the Musica Viva conference, Coimbra, Portugal, Sept. 2003
Musical research over the last century has become increasingly entwined with the scientific areas of acoustics, psychoacoustics, and electroacoustics, among others. During the last half century, the computer has become the central site of this research, including sound synthesis, digital signal processing and computer-assisted composition. One of the most striking developments in this encounter has been to push the frontiers of models of sound and music to the micro level, what is generally becoming termed "microsound". At this level, concepts of frequency and time are conjoined by a quantum relationship, with an uncertainty principle relating them that is precisely analogous to the more famous uncertainty principle of quantum physics. Dennis Gabor articulated this quantum principle of sound in 1947 in his critique of the "timeless" Fourier theorem.
Gabor illustrated the quantum as a rectangular area in the time and frequency domain, such that when the duration of a sound is shortened, its spectrum in the frequency domain is enlarged. In other words, a sine tone whose duration is less than 50ms becomes increasingly a broadband click as its duration becomes shorter. Conversely, to narrow the uncertainty in frequency, a longer "window" of time is required, both in analysis and synthesis. The auditory system balances its frequency and temporal resolution in a manner that is consistent with the perception of linguistic phonemes where the simultaneous recognition of both spectral and temporal shapes plays a crucial role in rapid identification of speech. The analogy to the Heisenberg uncertainty principle of quantum physics is not metaphorical but exact, because just as velocity is the rate of change of position (hence the accuracy of determination of one is linked to a lack of accuracy in the other), so frequency can be thought of as the rate of change of temporal phase.
A class of contemporary methods of sound synthesis and signal processing known as time-frequency models that emerged over the last two decades has their basis at this quantum level such that changes in a signal's time domain result in spectral alterations and vice versa. The best known of these methods is called granular synthesis and the granulation of sampled sound that produce their results by the generation of high densities of acoustical quanta called grains. These grains are composed of enveloped waveforms, usually less than 50 ms (meaning a repetition rate of more than 20 Hz), such that a sequence of grains fuses into a continuous sound, just as the perception of pitch emerges with pulses repeating at rates above 20 Hz. So-called "Gabor grains" have the frequency of the waveform independent of the grain duration, whereas "wavelets" maintain an inverse relation between frequency and duration, and hence are useful in analysis and re-synthesis models.
However, several other established synthesis methods are now regarded as time-frequency models, for instance the VOSIM and FOF models, both originally designed for speech simulation. Each is based on an enveloped, repeating waveform - a sine squared pulse with a DC component in the case of VOSIM, and an asymmetrically enveloped sine wave in the case of FOF. Moreover, it is the time domain parameters involved in each model that control the bandwidth of the result, usually intended to shape the formant regions of the simulated vowels. Michael Clarke realized the relationship of the FOF method to granular synthesis early on, and has proposed a hybrid version called FOG. In his work, a fused formant based sound can disintegrate into a rhythmic pattern or granular texture, and then revert to the original sound, even maintaining phase coherence in the process.
In my own work, the granular concept has informed most of my processing of sampled sound, the most striking application being to stretch the sound in time without necessarily changing its pitch. It is a revealing paradox that by linking time and frequency at the micro level, one can manipulate them independently at the macro level. In fact, all of the current methods for stretching sound are based on some form of windowing operation, usually with overlapping envelopes whose shape and frequency of repetition are controllable. The perceptual effect of time stretching is also very suggestive. As the temporal shape of a sound becomes elongated, whether by a small percentage or a very large amount, one's attention shifts towards the spectral components of the sound, either discrete frequency components, harmonics or inharmonics, or resonant regions and broadband textures. I often refer to this process as listening "inside" the sound, and typically link the pitches that emerge from the spectrum with those used by live performers. In other cases, the expanded resonances of even a simple speech or environmental sound suggest a magnification of its imagery and associations, as in my work Basilica where the stretched bell resonances suggest entering the large volume of the church itself.
Complex Systems with Emergent Form
Such a radical shift of focus in the basis of sound has had profound implications for not only our models of sound design, but also for the compositional methods that emerge from their use as well as the role of the artist in guiding complex processes. I suggest that these models are examples of a class of complex systems exhibiting emergent form that have been explored recently in other digital artistic applications such as non-linear chaotic systems, cellular automata and genetic algorithms. All of these methods, as diverse as their results may be, are linked by a set of common characteristics.
First, all of these methods follow the pattern that simple rules generate complex behaviour. The best known example of this process is non-linear dynamic systems, where an equation as simple as the logistic map (where the non-linearity is merely an x2 factor) exhibits bifurcation and chaotic regions under iteration when the non-linear component is scaled beyond a certain point. Such results defy the conventional assumption based in linear models that complex behaviour results from complex factors. It is also characteristic of these models that their component elements act as cells or quanta, and that the global behaviour emerges over large numbers of iterations, usually such that a computer is required for the intensive calculations involved. Both cellular automata and genetically based algorithms start with small discrete units and simple rules for their propagation, just as granular synthesis does in the acoustic domain.
There is a debate within computing science as to whether conventional serial programming architecture is the most efficient way to handle the modeling of complex systems, compared with parallel models that distribute local tasks among many different sub-processors. This argument is beyond the scope of the paper, except to comment that I was able to implement real-time granular synthesis in 1986 using an early micro-programmable DSP system known as the DMX-1000 controlled by a PDP LSI-11/23 microcomputer, only because the system architecture allowed multiple layers of calculation to proceed simultaneously at the so-called background and foreground, interrupt driven levels, in addition to the DMX's programmed execution. In other words, the higher speed DSP calculations computed the waveforms of the grains, and the host computer handled the scheduling of grain events, the calculation of random values for those events, as well as servicing the user's keyboard commands specifying control variables, and updating the screen display of those variables.
Finally, there is a tantalizing relationship between these models of complex systems and aspects of biological and physical systems of the real world. Researchers are beginning to use models of non-linear systems in everything from engineering to economics; even acoustics has used these factors in the physical modeling of musical instruments with regard to attack transients, inharmonicity and multiphonics. The most popular examples seem to be the application of fractals to computer graphics, including the simulation of landscapes, organic forms and clouds. However, at a metaphorical level, there is an intuitive sense that these systems somehow capture the complexity of the real world and hence they evoke a deep fascination for the public. It does not seem coincidental that the simplest and best known cellular automaton system, introduced by the mathematician John Horton Conway and popularized in Scientific American in the 1970s, was called "The Game of Life". Its little square pixel shapes could hardly be called realistic, but something about their behaviour where shapes emerged or "died" and often exhibited complex patterns that only developed after extensive iteration excited the imagination. Perhaps it is what we now would call "artificial lifeforms" that are a simulacrum of the real world, both its reflection and an extension. The popularity of both the commercial and artistic versions of such lifeforms demonstrates that humans are readily disposed to conferring the status of sentient being onto such phenomena, perhaps as mirrors of ourselves.
For the artist, the concept of emergent form is also intriguing, but it also has the potential to change the role of the artist from the person who is required to generate every detail of the intended result, to one who guides the processes that produce the result. Otto Laske has described the transition within 20th century practice from model-based work (essentially an imitation of existing structures) to a rule-based, generative model, or in G. M. Koenig's formulation: Given the rules, find the music. Each of these ideas changes the artist-machine relationship to a more equal partnership, or as Laske suggests, the machine acts as the composer's alter ego, with the design process involving an interaction between the externalized knowledge of the machine, and the internal knowledge of the artist. We can understand software development in this approach as the gradual transfer of knowledge from one source to the other via processes of automation and artificial intelligence. Instead of seeing this as a case of "machines taking over", we can identify it as a partnership where what is uniquely human -- such as the ability to ascribe meaning to complex patterns -- emerges more clearly.
Convolution: linking the time and frequency domains
To return to the microsound domain, we can note that another fundamental principle linking the time and frequency domains is illustrated by the technique of convolution. Convolution has been a standard topic in engineering and computing science for some time, but only since the early 1990s has it been widely available to computer music composers, thanks largely to the theoretical descriptions by Curtis Roads (1996), and the SoundHack software of Tom Erbe that made this technique accessible.
Convolving two waveforms in the time domain means that you are multiplying their spectra (i.e. frequency content) in the frequency domain. By multiplying the spectra we mean that any frequency that is strong in both signals will be very strong in the convolved signal, and conversely any frequency that is weak in either input signal will be weak in the output signal. In practice, a relatively simple application of convolution is where we have the impulse response of a space. This is obtained by recording a short burst of a broadband signal as it is processed by the reverberant characteristics of the space. When we convolve any dry signal with that impulse response, the result is that the sound appears to have been recorded in that space. In other words, it has been processed by the frequency response of the space similar to how that process would work in the actual space. In fact, convolution in this example is simply a mathematical description of what happens when any sound is coloured by the acoustic space within which it occurs, which is in fact true of all sounds in all spaces except an anechoic chamber. The convolved sound will also appear to be at the same distance as in the original recording of the impulse. If we convolve a sound twice with the same impulse response, its apparent distance will be twice as far away. For instance, my recent work Temple is based on processing singing voices with the impulse response of a cathedral in Italy, that of San Bartolomeo in Busetto, which is available on Howard Fredrics' Worldwide Soundspaces website [and in the Impulseverb option in Peak for OSX]. Given the intrusiveness of such an action as firing a gun, the more acceptable approach was to break a balloon and record the response of the space.
In fact, one can convolve any sound with another, not just an impulse response. In that case, we are filtering the first sound through the spectrum of the second, such that any frequencies the two sounds have in common will be emphasized. A particular case is where we convolve the sound with itself, thereby guaranteeing maximum correlation between the two sources. In this case, prominent frequencies will be exaggerated and frequencies with little energy will be attenuated. However, the output duration of a convolved signal is the sum of the durations of the two inputs. With reverberation we expect the reverberated signal to be longer than the original, but this extension and the resultant smearing of attack transients also occurs when we convolve a sound with itself, or with another sound. Transients are smoothed out, and the overall sound is lengthened (by a factor of two in the case of convolving a sound with itself). When we convolve this stretched version with the impulse response of a space, the result appears to be half way between the original and the reverberant version, a ghostly version of the sound, so to speak. In Temple, both the original sound and the version convolved with itself are convolved with the impulse response of the cathedral and synchronized to begin together, thereby producing a trailing after-image within the reverberant sound.
The long history of science serving art is clearly continuing into the present era, with one of the most intriguing pathways involving the quantum level of microsound, or what might be called "the final frontier" of acoustic and musical research. Art in the service of science has fewer, but still significant, contributions such as the artistic visualization and musical sonification of databases. The computer is central to both types of process. However, as in any close relationship, each of the partners may be changed by the encounter. In fact, we could analyze artist-machine experience along the lines of whether the technology plays a servant role in merely assisting in the production process (what I refer to as computer-realized composition), or whether it participates in a manner that changes the artistic vocabulary, process and ultimate result. This latter type of process ranges from the interactive partnership that might be called computer-assisted composition, through to a fully automated, rule-based computer-composed type of work. I have suggested here that when the computer is used to control complexity that results in emergent forms, the role of the artist is profoundly changed to many possible scenarios: guide, experimenter, designer, visionary, poet, to name but a few. In my own work, I have experienced elements of all of these roles, but what sums them all up is my role of relating the inner complexity of the micro domain soundworld, to the outer complexity of the real world in all of its natural, human and social dimensions. It is a journey that I find particularly inspiring.
Clarke, M. (1996) Composing at the intersection of time and frequency. Organised Sound, 1(2).
Gabor, D. (1947) Acoustical quanta and the theory of hearing, Nature, 159(4044), 591-594.
Laske, O. (1990) The computer as the artist's alter ego, Leonardo, 23(1), 53-66.
Roads, C. (1996) The Computer Music Tutorial, MIT Press.
Roads, C. (2001) Microsound, MIT Press.
Truax, B. (2000) The aesthetics of computer music: a questionable concept reconsidered, Organised Sound, 5(3), 119-126.
Truax, B. (1999) Composition and diffusion: space in sound in space, Organised Sound, 3(2), 141-6.
Truax, B. (1994) The inner and outer complexity of music, Perspectives of New Music, 32(1), 176-193.
Truax, B. (1994) Discovering inner complexity: Time-shifting and transposition with a real-time granulation technique, Computer Music Journal, 18(2), 38-48 (sound sheet examples in 18(1)).
Truax, B. (1992) Composing with time-shifted environmental sound, Leonardo Music Journal, 2(1), 37-40.
Truax, B. (1992) Musical creativity and complexity at the threshold of the 21st century," Interface, 21(1), 29-42.
Truax, B. (1988) Real-time granular synthesis with a digital signal processor, Computer Music Journal, 12(2), 14-26.