Sound Examples of Text/Voice Processing
Granular Time Stretching
A. Using a short fixed sample
Even a very short sound sample, of the duration of a phoneme (ca. 200 ms), can yield interesting results when overlapped with many other grains derived from the sample. Note that the grains need to be smoothed with an envelope whose shape and duration may result in a broadening of the spectrum when it's nearly rectangular and/or less than about 25 ms; smoother envelopes (for example, lasting 1/4 to 1/2 of the overall duration) and longer durations (try 40-50 ms) will allow the original timbre of the sound to come through clearly. Small but important variations in the grain can be obtained by using different "offsets", i.e. distances from the start of the sample. Moving the offset is similar to "magnifying" a sound as if under a microscope, that is, hearing details in slow motion. Because the samples in the grain are in the same order as the original there will be no change in pitch, as long as they are played back at the same sampling rate. Repeating every sample or skipping every other sample will result in a pitch change of an octave down or up, respectively.
Opening of the Wings of Nike using male and female phonemes (fixed samples), at normal pitch and transposed up or down an octave
Song of Songs: "I sat down"; then using diphthong "ow", stretched 60:1, 100:1, and harmonized 4:3, 4:2, 4:1, 4:6
The Shaman Ascending: repeated grains derived from a sung note, 150 ms duration followed by a 50 ms delay with small changes in offset
The same with 100 ms grain duration followed by a 50 ms delay
B. Using a longer sound sample
Time stretching of longer sound samples can effectively be achieved using a granular technique. Sound editing software also uses overlapping windows to make small stretches independent of pitch, but with longer stretches (e.g. more than 5 times the original duration) there will be a modulation effect caused by the periodic repetition of the windowed grains. In the examples below, this effect is avoided by using random variations in grain duration and/or delay times and multiple streams of grains, each having its own duration and offset. The result creates a hopefully interesting texture, rather than a transparent time stretch which is the goal of editing software. The trick is to detach the progress of the current time position (slowing it from real time to an arbitrary speed) from the grain itself whose samples are still in the same order as the original sample, hence no pitch change. Note that the offset in this case takes samples from the recent past, or even the entire sample that is in memory similar to a delay line. Micro delays will blur the sound, whereas large offsets will appear to scramble the entire sound.
Song of Songs: original reading by Thecla Schiphorst "Return, return", at normal speed, then stretched 10:1"Fair as the moon ... terrible" stretched to a maximum of 100:1 and harmonized
Song of Songs: "I am the rose of Sharon", original, then stretched +33% and harmonized at 4:3, 4:2, 4:1, then "rose of Sharon" stretched 50:1 and 100:1 with harmonization
Song of Songs: "The work of the hands ..." with variable stretching and harmonization 4:3, 4:2, 4:1Song of Songs: monk singing and bell, granulated at normal speed, then stretched 20:1
Wings of Fire: "We cross continents ..." granulated with lower harmonic 4:3, stretched on last syllable ("ch")
Androgyne Mon Amour: "Wolf's hour", repeated twice, granulated +50%, then long stretch, re-attack on "hour" and maximum stretch, plus 4th lower
Androgyne Mon Amour: "Shaken up and scattered on the floor", granulated with grain duration 8 ms, 50 ms delays, then 340 ms, at 5:1 stretch
The Karplus-Strong resonator is realized with a waveguide delay line that allows feedback and level controls. The algorithm is in fact a physical model of a string or a tube open at both ends (with a variation simulating a tube closed at one end). The length of the delay line is correlated with the resonant pitch, with longer delay lines producing a lower pitch, as with strings and tubes. However, using a high degree of feedback (called hyper-resonance) exceeds what would normally be possible with an actual string or tube. In some of the examples below, the use of hyper-resonance itself is excessive, but works well when the sound is stretched and the resonances suggest a large space. The feedback also requires a low-pass filter so that the sound decays eventually to the fundamental pitch of the resonator.
Powers of Two: The Artist. Counter-tenor singing "L'homme armé" hyper-resonatedPowers of Two: The Artist. "L'homme armé" hyper-resonated and stretchedPowers of Two: The Artist. Italian poem "Io canto amor" resonated
Wings of Fire: "Your tongue speaks languages ... " resonated on C/G; second time with input lowered, feedback slowly raised
Androgyne, Mon Amour: "Androgyne, mon amour", granulated with low resonator, stretched 2.5, then 20:1
Androgyne, Mon Amour: "Even less would that be true", various double resonators in chordal tunings
Androgyne, Mon Amour: "Scent of thyme is cool and tender", double resonator, retuned, plus
granulated with resonator, 2:1 stretch, then amplitude correlated stretch
Androgyne, Mon Amour: "unclothed flesh", long stretch (amplitude correlated) with high resonator, strong feedback, re-attacked on consonants
Comb Filter
The comb filter is realized with a usually short delay line that combines the original signal with a delayed version of itself, resulting in the cancellation of the odd harmonics, an effect called phasing. With feedback, the effect is to emphasize all of the harmonics of the frequency associated with the length of the delay line (similar to the Karplus-Strong resonator), which is an octave above the odd harmonic frequencies that are cancelled in the simple case. The comb filter effect is more strongly perceived when the delay length is changing, the feedback stronger and the sound has a broadband spectrum. Each delayed signal is called a "tap" of which there may be multiples.
Song of Songs: "My beloved spoke", processed with multi-tap comb and high pass filter with stereo delay, then stretchedConvolution
Convolution of two signals multiplies their spectra which means that strong frequencies in the spectrum become stronger and weak ones weaker. The duration of the result is the sum of the durations of the two signals involved. The most common use of convolution is to place a sound within a reverberant space when the impulse response (a sudden broadband sound such as a balloon breaking) for the space is available. However, any two sounds can be convolved, and a sound can be convolved with itself (auto-convolution). In the latter case, the prominent frequencies of the sound dominate, weaker ones are eliminated and the duration is doubled. When a recording of multiple sounds such as a string of words is convolved with itself or another such string, every word is convolved with every other word in a complex rhythmic result. In the following table, the * symbol indicates convolution of the sounds in the first two columns.
Dry Sound Impulse Response Dry * Impulse Response Auto-Convolved
Sue McGowan
Busetto Cathedral
McGowan * Busetto
McG * McG * Busetto
Derrick Christian low D
multiphonic
Busetto
DC low D * Busetto
DC multi * Busetto
DC low D * DC low D
DC multi * DC multi
Art text
Free text
Art * Free
Art * Art
"A Well" text
A Well * A Well
A Well * A Well
Well Splashes
Well * A Well * A Well
Note: Many of the above examples are put into their complete compositional context elswhere on this DVD-ROM and its companions. See also the pages devoted to text processing in Wings of Fire and Androgyne, Mon Amour. The "art/free" texts, spoken by Christopher Gaze, are from the composer's work Prospero's Voyage; the "well" text (spoken by Thecla Schiphorst) is from Chalice Well, and the sung material from Temple.
Recommended software for realizing these types of processing are MacPod for granular time stretching with resonators, SoundHack for convolution, and GRM Tools for various types of processing.
home