Notes on Production Techniques

Notes on Production Techniques with the PODX system

Sample based processing with the PODX system, using the DMX-1000, is based on retrieving samples stored on RL02 disks, and transferring them via a DMA interface to the DMX where they are processed in real time.

Disk Access

The samples are stored in 5k blocks on the disk, and 1024 such blocks are available. A sample sequence is identified by two numbers, such as 100/5 which means the sequence starts at block 100 and lasts 5 blocks (approx. 1 second). Unless otherwise specified, the sequence repeats as a loop. In order to smooth the transition where the end of the loop joins the beginning, a software fade-out and fade-in is usually requested, lasting anywhere from 10 to 100 ms. Therefore, even very short sound sequences (minimum 1/5 sec) can be joined without an audible click.

Given the fast access time of these (historical) disks (i.e. over 50 kHz in sequential or random access), it is possible and often useful to access disk sequences other than with a simple loop. The PODX software allows randomly accessed blocks within a given range, with the duration of each random portion being incremented or decremented in real time (minimum 1 block, and 1 block increments). In addition, a group of "presets" can be defined for the disk loops, that is, a set of pre-defined sample sequences, identified by a letter. These can be called back at any time and allowed to loop. An additional command requests an immediate switch to the new loop, rather than waiting until the end of the current loop. A sequence of such presets (max. 16) can also be specified and recalled at any time. The sequence itself will repeat at its conclusion.

A more dynamic disk access procedure is the variable loop. Either the start block number or the block duration, or both, can be requested to increment or decrement by a variable integer. A simple example is +100 / 5. With an INC of 1, the disk sequence plays 5 blocks starting at block 100, then advances to block 101, and so on. A negative sign - 100 / 5 would result in the start position going backwards, 100, 99, etc. In the case of 100 / +5, then start position stays the same, and the block duration increments each time (5, 6, 7 ... ). A particularly useful case is - 100 / + 5 in which the end point of the sample sequence stays the same but the start position gets earlier each repetition, the sequence being 100/5, 99/6, 98/7 etc. If the INC value (which can be changed in realtime playback) is greater than 1, then each of the variables changes by that amount. The smoothing operation assists in making the transitions inaudible in terms of clicks.

Disk Speed

The speed of disk playback, and hence the resulting sampling rate, is controlled in real time by a single number called the speed value. This value is literally the instruction number in the DMX-1000 microcode sequence where the Halt command is inserted, after which the code repeats. The higher the position within the 256 code limit, the slower the output to the DAC. The formula for calculating the sampling rate FS from the speed position is:

FS = 5,000,000 / (SPEED + 4)

which reflects the 5 mHz clock rate for the DMX. Therefore, a speed value of 163 is approximately 30 kHz (the normal sampling rate used for these piece); 196 is exactly 25 kHz, 246 is 20 kHz, and 96 (the fastest speed) is 50 kHz. The software allows the speed value to be typed in or changed incrementally, thereby producing a glissando up or down. Slowing down the microcode operation is accomplished by inserting NOP's (no operation instructions) into the code. However, the maximum speed for a given microcode configuration will depend on how many instructions are needed for that processing. Simple processing may take fewer than 96 instructions, in which case the disk access time is the limiting factor. More complex processes such as granulation will take many more instructions. However, in all modes of granulation, the software allows the user to specify how many "voices", or simultaneous grain streams, are needed, hence how many instructions are required. Fewer voices will mean that the sampling rate can be higher, and so on. In most cases, the number of voices closest to the 30 kHz sampling rate is used.

Realtime Processing

The PLAYDK program, besides implementing all of the disk access options listed above, also allows various simple types of processing using the DMX microcode and its 4K memory. A chain of processes can be requested, as long as they fit within these limits. If the first process requested is "stereo", then the signal is split into two, with the right channel delayed from the left by a given number of samples (max. 511). The actual time delay will depend on the sampling rate. For instance, a maximum delay of 511 samples at speed 163 (approx. 30 kHz) creates a delay of about 17 ms. Note that very short delays can simulate binaural time delays. Once the signal is split into stereo, independent processes can be assigned to the left and right channels, something that is particularly useful with delay-line resonators and comb filters.

A simple low-pass and high-pass filter can also be requested, with the cutoff being changed in realtime during playback. This filtered signal can then be passed to any of the delay-line based processes such as the resonators.

The process used most often in the pieces documented here is the "Karplus-Strong" resonator. A more detailed description of this process is included with each piece. In terms of the production documentation, the typical specification is KS = 305, or in the case of a stereo signal, KS = 305/181, with feedback (F) = 1000. This means that the length of the delay line is 305 samples (max. = 511), and in the stereo case, the left channel delay is 305 samples and that of the right channel is 181 samples. The frequency f of the resonator when the delay line is p samples, with a sampling rate of FS, can be predicted from the formula:

f = SR / (p + 1/2)

Given that the perceived pitch of the resonator often needs to be in tune with a live performer, delay lines that produce approximate tempered pitches are often used, such as with the following table based on a 30 kHz sampling rate:

Pitch p + 1/2

C2 457.8

C# 432.7

D 407.9

D# 384.8

E 363.4

F 343.0

F# 323.7

G 305.5

G# 288.4

A 272.2

A# 257.0

B 242.4

The chosen delay line length is tuned by subtracting 0.5 from the table value and taking the nearest integer. To obtain higher octaves, either the exact value can be calculated from the formula, or a rough estimate can be made by dividing the above table value by 2 and subtracting 1. For instance, the octaves of A are approximately 543, 271, 135, 67 samples respectively. Care must be taken when changing the length of the delay line during playback. Requesting shorter lines (i.e. fewer samples) is usually safe and inaudible, but lengthening the line is tricky because unrelated samples are being brought into play, and often an audible click occurs. Each delay line has a choice of positive or negative feedback into the delay line, the former case (the default) producing all harmonics of the fundamental frequency described above, and the negative option producing only odd harmonics of that fundamental. Two delay lines in series (per stereo channel) can also be requested, though difficult to control.

The feedback parameter controls the degree of resonance and the length of the decay by determining how much signal is recirculated into the delay line. The maximum value, 1024, creates 100% recirculation which quickly leads to saturation and distortion. Values above 900 produce a significant amount of resonance, with values over 1000 having to be handled carefully to prevent distortion. This can be done through trial and error by changing the feedback manually, and/or by controlling the input level (0-127) of the signal. Although the input level is usually recorded on the original production notes, it is usually omitted here for clarity, or referred to only when its variation adds significantly to the processing. Usually it is set by trial and error to its maximum value that allows the entire sample sequence to be resonated with the desired feedback level. If the resulting input level is too low, then a higher level is used, and quickly lowered during the peak amplitude moments. A useful technique is to have a very low input level combined with a very high feedback level such that the resonated signal dominates the dry signal.

Realtime control parameters in PLAYDK can also be "synch'd" so that they can be ramped simultaneously, either in direct or inverse correlation. This is particularly useful for simultaneously controlling input and feedback levels (usually inversely correlated to avoid distortion). Additional choices are to change the control parameter at the end of each loop, either incrementally or randomly.

Realtime Granulation

The details of granulation with sampled sounds are described elsewhere in the document and related publications. Here, some of the "shorthand" for the control parameters used in the production pages is elaborated.

The stretch factor is described as a ratio, e.g. 2:1. This refers literally to the number of milliseconds that the current time pointer is "frozen" (i.e. does not advance) and the number of milliseconds that the current time pointer subsequently advances. If we refer to this as the off:on ratio, then the amount of stretching can be calculated from the formula:
Time stretch factor = (off + on) / on
Therefore, 0:1 produces no stretching, 1:1 produces a 2-fold stretch, 2:1 produces a 3-fold stretch and so on. There is no upper limit to the amount of stretch. Small stretches (useful for text) between 1x and 2x can be achieved with ratios such as 1:4, 1:3, 1:2. These ratios can be typed in manually, recalled by a keystroke from a "preset", or correlated automatically with signal amplitude. This latter approach is particularly useful for signals with active dynamic shapes. The software pre-scans the amplitudes in each disk block (hence the resolution of the amplitude following is about 5 values per second), and the user specifies what the maximum stretch factor will be, which will be put into effect during the block with the largest amplitude. The user can also choose between using the maximum +/- amplitude, or maximum + only (the former generally being more useful). Therefore, a typical specification such as "amp. correlated at max = 15" means that the signal amplitude is correlated with the stretch ratio such that the maximum amplitude is treated with the 15:1 ratio (i.e. a 16x stretch), and lower amplitudes proportionately lower. The user can revert to a manual control at any time. The correlation can be deliberately offset by a given number of blocks. The main use for this is to avoid stretching a noisy attack, while giving maximum stretch to the subsequent section of the sound. However, other offsets are useful to change the overall amplitude curve of a stretched sound. Also, a minimum threshold value for the signal amplitude can be specified; when the signal falls below that value, the playback reverts to 0:1, i.e. no stretching.

There are several granulation subroutines that use time stretching with optional other parameters. Each requires different sets of microcode, hence determining the number of simultaneous voices or "streams" that can be generated at a given sampling rate. This number of voices is sometimes referred to in the documentation. Frequently a Karplus-Strong delay line resonator is added prior to the granular stretching, resulting in 10 possible voices at 30 kHz. The KS parameters are as described above, except that the delay line can be a maximum of 1024 samples. Harmonics can also be added to individual granular streams according to a preset, or else specified manually. The system used treats the original pitch as harmonic number 4, thereby allowing 3 transpositions below the original (e.g. 3, 2, 1, producing the ratios 4:3, 2:1, 4:1 respectively), and more closely spaced ones above (e.g. 5, 6, 7, 8 which produce the pitch ratios 5:4, 3:2, 7:4, 2:1 respectively). In the production notes these are sometimes referred to by harmonic number or the interval it creates to the original, plus the number of voices which have this transposition. If not specified, the breakdown is usually half and half.

Pitch	p + 1/2
C2	457.8
C#	432.7
D	407.9
D#	384.8
E	363.4
F	343.0
F#	323.7
G	305.5
G#	288.4
A	272.2
A#	257.0
B	242.4