Film Sound 232 Theory
To see is to understand...
The biological imperative
What is natural?
Design concepts from Zaza
Design theory
Recent theory
Terminology for analysis
Essential points for analysis
Figures of speech as descriptions of sound
More recent theory
Doing the work


To see is to understand in this culture.
- We say "I see" when we understand;
A picture is worth a thousand words; the visual, the surface, is dominant

-Technically we might say: In cases of intermodal discrepancy, the modality allocated the most attention tends to win most disputes in convincing the final percept of an event/object: This is usually the visual.
Sound, however, actively shapes our interpretation of the visual:Sound directs our attention (a biological survival imperative)

• It therefore has a visceral effect:

-one is startled by a sudden loud sound

-it informs us of dangers lurking beyond the range of vision (FORESHADOWING).

• A number of the limitations and characteristics of the hearing mechanism seem rooted in strong survival-based instinctive activity: -our stereophonic hearing,

-our sensitivity to frequencies at 2,000 Hz.,

-our dynamic range,

• Sound informs the other senses: -physical sensation; especially weather -(insects) cicadas, heat/locale

-wind (invisible except for things that it moves of course)

-rain (often invisible in 16mm or TV)


-movement (every time a car moves on screen!)

-space the camera finds itself in

-with low freq. signals - actual physical sensation

-distance from the viewer in terms of level/reverberation ratio

• Also; to be discussed in more detail later.. -qualities of the space itself: In real life we learn to adjust to small differences in room tone, this unconscious process does not work when the sound is focused by selection: these small discrepancies are not tolerated: They become distracting, harming the carefully constructed screen reality

-that which is outside of the frame


• Sound creates the context for silence -if film is projected without a sound track then the ambient sounds of the theatre create the acoustic context

-obviously a moment of silence in a film with a sound track can be as effective as a burst of sound in a "silent film" although a lack of sound (except the hiss of the blank track) is very rare.

• Sound has its own aesthetic form

What is "natural Sound"? • Virtually always manufactured rather than captured at the moment of filming -imagine the scene with a single mic in the manner of a single lens. The mic is objective while our hearing is an experience subjective through attention mechanisms - thus the need to separate elements from the scene for manipulation. • Selective as the lens: The difference between the ear & a microphone -mixing(post processing) or less often mic perspective directs our attention: note Altman where often the main characters dialogue is not foregrounded • Has a source of emanation but surrounds the listener -as opposed to light from a screen which can be shut out by closing the eyes • Creates the continuity necessary in a disjointed medium of rapidly changing images - sutures -creates the sense of a continuous and unbounded spatial environment by maintaining a constant and generally unobtrusive background ambiance • We do not automatically identify a sound with its source -sounds always pose the question of their source (the sound hermeneutic)

-this is both a problem to be considered at all times and also an asset which can be put to dramatic use: How often do we hear a sound for which no image can be conjured up or if we do imagine what it looks like - we are wrong?

• Hearing requires a greater event duration than vision for recognition -thus: hearing is more selective and lazier than sight - therefore if sound is background (music) than it is least susceptible to rigorous judgement & most susceptible to effective manipulation. Naturalism means producing the desired effect (Walter Murch)

The logic is created within the shot and has a logic only in relation to screen space and time, not real space and time.

Always: Listen to the sounds themselves, not the source

Sound in pictures often foreshadows the action

• This has a biological source

• It must be always taken into account:

Some Design Concepts from Zaza's Sound Design:
auditory perspective
The difference between the ear & a microphone

The acoustic analogue between long shot & close-up

Logic is created within the shot: this logic relates to screen space time, not real space time

apparent distance
Depth of field as defined by echo or resonance

Room tone is critical in film, we rarely are aware of it in real life

ambient silence
The unique and unrepeatable sound characteristic of a given space:

Note here that the lack of track will create hiss or will draw the attention to the sounds of the projection space itself

microphone characteristic
Alters the quality of the space
aural masking
Property of one sound inhibiting the separation or audibility of another, proximate sound: occurs in mixing
time compression
Duration shortened or lengthened through editing, dynamic change; changes the perception of time
the illusion of continuity
Sound perspective, or the apparent microphone position, may correspond to:

The position of the audience

A spectator in the shot

The Director’s whim

The sound is crucial to help decode the puzzle of an arbitrary selection and arrangement of shots.

ambient noise
The low frequency sound of a specific place at a specific time:

Felt in the theatre

If absent, we hear the hiss of the system or the sound of the theatre itself

It is necessary to have noise to create silence

Design Theory Preamble: The Difficulties of analyzing the soundtrack
V.I. Pudovkin
Sound for film as should be understood as counterpoint: sound may by in opposition to what is on the screen if it presents the emotion the director intends the audience to feel or accept.

Current practice of slavishly creating naturalistic soundtracks where every event has a crescendo of effects (especially action films) he would see as primitive. Sound should be used to explain the content more deeply to the audience.

Siegfried Kracauer (Theory of Film)
1/ Synchronism vs. asynchronism

Sound has a visually identifiable source and sound that does not


Complementing the image and sound that carries a different meaning

3/ actual vs. commentative sound

Arising from an identifiable source from the narrative or not having a source in the narrative

Gorbman expands upon 3/:

1/ the characters may make a noise

2/ sound may originate outside the narrative structure (music themes or narrative v.o.)

3/ sounds may be imagined by characters but not actually heard by their ears

Claudia Gorbman:
Auditory depth of field

Sound presence can counterpoint a visual presence (visual long shot, audio close-up)

Support or exaggerate visual image (Thatcher library in Citizen Kane)

On/off screen sound

On/off track

Is sound directly perceived by the ear or inferred; such as when people converse behind a closed window or in a noisy setting

Auditory masking

Sound focus

As in cinema’s version of the cocktail party effect


Cinematic space of the location

Another Design Grid: (Zaza)
sound has a source that is visually identifiable
sound has no apparent image
sound complements an image
sound has meaning itself
sound has a narrative source
sound has no story basis, but a directorial attitude

Modern Theory • Rick Altman & Mary Anne Doane -view the evolution of sound technology as an ideologically determined progression toward self-effacement. Reducing all traces of sound work from the sound track and concealing the work of the apparatus. • It is important to remember that we are seeing two completely separate processes (sound & image) -which are synchronous only at the location and in the final projection - otherwise their construction is done separately - and the end product very rarely points this out...

-also important to note that we neither see the apparatus itself nor is its existence revealed; i.e. through shaky camera (except for cinema verité or p.o.v. effect and) likewise for the microphone to give the effect of a "live pick-up" etc.

• Sound perception is usually bound up in image: -the two are apprehended together although the sound is often perceived through or in terms of the image and therefore acquires a secondary status.

-The image is directly motivated by the world and therefore possess a wholeness that serves as further testimony to its integrity: It cannot be broken down into smaller elements whereas the sound is broken up into dialogue, sfx, ambiences, music etc.

-important to note that as the image (now in analogue form) becomes readily digitized it will be easier and more common to have the image broken down into components and those components manipulated (although no doubt while maintaining the illusion of wholeness). Most movies with matte work (i.e. sci-fi, adventure etc.) do this as a matter of course. Another example is the isolated colour manipulation characteristic of music videos and now hip TV commercials. -Sound is perceived as real only when it is tested by other senses - usually sight. In film one establishes sync with character or sound producing element early in scene: once established one can separate the two. This is a radio convention, which has been carefully maintained since the late twenties.
• What the sound track seeks to represent is the sound of the image, not that of the world. (Belton, p. 66) -"What the sound track seeks to duplicate is the sound of an image, not that of the world. The evolution of sound technology and,again, that of studio recording, editing and mixing practice illustrate, to some degree, the quest for a sound track that captures an idealized reality, a world carefully filtered to eliminate sounds that fall outside of understanding or significance; every sound must signify. In other words the goal of sound technology in reproducing sound is to eliminate any noise that interferes with the transmission of meaningful sound." Belton, Technology of Film Sound p. 66 -The sound track does not undergo the same tests of verisimilitude to which the images are.(This obvious: tires that screech on sand or virtually any time a car drives away, the pristine nature of the tracks (recently they have acquired dog barks and kids screams but usually the world is a pretty silent place) etc..

-Images gain their credibility in terms of their reference to objective reality; sounds, in their conformation to the images of objective reality, to a derivative reconstruction of that reality.

-The soundtrack is a world where every sound has significance - sounds which fall outside the realm of understanding or significance are filtered out. This is what is now so disturbing about "Godard's work" and is used to give the "documentary feel" to the work: Test-type commercials,

-to repeat: the goal/evolution of sound technology is to eliminate any noise (extraneous sound) which interferes with the transmission of meaningful sound.

-or as Walter Murch once described it: "the sound one hears in one's head."


•spatio-temporal universe referred to by the primary narration

•the denotative (i.e. strictly literal) material of film narrative, it includes, according to Christian Metz, not only the narration itself, but also the fictional space and time dimensions implied by the narrative. 

•narratively implied spatio-temporal world of the actions and the characters


• flashbacks, dreams, visions, and fantasy)

diegetic or source sound:

•source clearly identified in the story space




-functions to enhance the verisimilitude of the image

-may be off screen 

-may be asynchronous

• note: off screen sound "naturally" leads the cinema space to unfold beyond the current screen

• since many sounds are ambiguous out of context most sfx are diegetic - only music crosses the boundary effectively

diegetic or source music:

• arising from the primary narration

• high quality diegetic music can articulate space: opera in the concert hall etc. (reverberation)

-also as an aside there is a penchant in contemporary cinema for having an aria playing in rich peoples homes: Someone to Watch over Me, Wall Street, leads one to ask WHAT QUALITIES DO WE CULTURALLY ASSOCIATE WITH OPERA MUSIC? (WEALTH, PRIVILEGE, DEEP EMOTIONS IE. TRAGEDY,INTELLECTUAL PURSUIT, ETC.. • provides temporal continuity between cuts

• offers depth cues (loud•near, soft•far)

in narrative film source music functions primarily as sound

• creates irony very easily (often primary use) -

non-diegetic or extra-diegetic:

• no clear source in the story space:

-music scoring -continuity




-point of view

-narration -are the character's thoughts diegetic? it clearly comes from the story space but often it's displaced in time (remembrance, story telling etc.,) -songs


Essential points in the analysis of a soundtrack:
narrative content
Sound supports the space-time reality (of the director)

Sound may imply a reality other than what is seen

audience perspective is determined by the sound focus

audience attention is determined by masking

off-track sound (sound is seen but not heard)

spatial content
screen space is defined by on-screen sound; the source is seen

screen space is implied by off-screen sound, the source is unseen

sound texture (reflections) defines the characteristics of the space

emotive content
sounds not heard by the characters

sounds not heard by the audience: sounds are inferred by the action or reaction of players (may have a comedic or horrific responses)

aural hallucination of the character(s) used to punctuate actions or convey mental/physical movements not seen on the screen but operating in the spiritual domain (see Harvey)

aural allusions evoke imagery that must be understood as something a character perceives and, therefore, is understood in terms of what the audience must interpret in terms of how the image further develops the p[lot, character or mood

musical/rhythmic content
metaphor, symbol, rhythm/editorial pacer, atmosphere, reinforcer of "staged" realities, spatial definer, or for inflecting the narrative with emotive values via cultural music codes important to note that sfx can be considered in musical terms. the timing or rhythm of sounds will help to identify them (if not seen) or lend to the credibility of the effect. summary of some musical functions: or

figures of speech (as descriptions of sound) simile

contrast of two elements explicitly compared (scream -> train whistle)


suggest a comparison by application of an element to an object or concept that it does not denote (cream from cheers)


contrast of opposites through the use of the least expected or anticipated form (scream with laughter)


obvious exaggeration (scream with phone ringing)


(screaming chair)


substitution of one related element for another (scream with screeching winch)


seemingly contradictory sound used over an entity that in reality may express a possible truth (scream over typewriter keys)


representation of an abstract or spiritual image through concrete or material forms ( the actual scream in the movie "The Shout"

More Theory: New technology very often brings with it the glow of neutrality or transparency

this functions to prevent or delay ideological interrogation.

-we realize now that the recording of sound is a mediation which transforms and represents its object in an altered form.

-"There is clearly a difference between a filmed object or action (it is a photograph of the thing or act) and a recorded musical sound. For (the latter) is the sound itself. There is no ontological difference between hearing a violin in a concert hall and hearing it on a sound track in the movie theatre." Gerald Mast, Film/Cinema/Movie: A Theory of Experience NY, Harper & Row, 1977 p. 216

Is this in fact true? -Auditory aspects, providing that the recording is well done, undergo no appreciable loss in relation to the corresponding sound in the real world: in principle, nothing distinguishes a gunshot heard in a film from a gunshot heard on the street. Christion Metz, "Aural Objects", Yale French Studies, no. 60 "Cinema/Sound" 1980 p.29 Again this is in fact a controversial statement.
Several theorists take this stance, that the reproduced aural object is identical to the original. -Sound is mechanical radiant energy that is transmitted by longitudinal pressure waves in a material medium (such as air). The materiality of a sonic event consists of this entire vibrating volume. This is also the case for reproduced sound. Now, whereas an object loses its 3 dimensionality when represented in the photographic image, the recorded sound, considered as a volume of vibrating air waves, remains three dimensional after mechanical mediation. Sound differs because it suffers no dimensional loss in the process. But is this sufficient to suggest that there is no difference at all between them? Is representation only the domain of the visual? Truth is most people can tell a synthesized copy from the original or live sound. Contrary to earlier statements even if the sound reproduction is flawless - volume will be different in copy and therefore the sound will be different: all symphonies are chamber music in the living room... There exists the phenomena of socially constructed auditory practice which emphasize the similarity of these sounds despite their differences in order that they may be linked to a common source -i.e. the gun in the theatre is recognized as a gun even though it would sound very different on the street.

-Also like the gun (as an example) the cinema and radio construct sounds which we as listener rarely hear in real life yet recognize: most weapons, large explosions, other people's lovemaking

Sound is tied to its space of occurrence by its 3-dimensionality - even when mechanically reproduced -thus sound contribute 3-d & the auratic effect to the image.

Building the track What do we want the audience to hear


What size space

Natural, staged or artificial location/state of mind


Color and shape of elements in the frame

Context: what are we trying to preserve, imply or invent

What individual elements are required to build the larger gestalt/sound event?

Terminology for Analysis:

discrete atmospherics
points of sound that make up the entire texture of background to foreground noises that fill in the pictorial space to make it seem real: insect noise, wind gusts, rustling leaves, etc.
un-vocalized idioms
complex guttural and nasal sounds that are not a part of speech but imply presence, tone or distance to the listener
aural intrusions
depth cues
range of sounds that provide depth cues or cues to the relative aural position of the on or off screen source. this can be a train whistle or a door slam
class of sounds that reveal planes of action (density of detail) or that isolate an important subject (level of intensity) from a group of subjects and their backgrounds. music combined with nonverbal gestures or sfx provide this focusing of attention.