The Aesthetic Experience of Sound

– staging of Auditory Spaces in 3D computer games

Morten Breinbjerg (University of Aarhus, Denmark)

The use of sound in (3D) computer games basically falls in two. Sound is used as an element in the design of the set and as a narrative. As set design sound stages the nature of the environment, it brings it to life. As a narrative it brings us information that we can choose to or perhaps need to react on. In an ecological understanding of hearing our detection of audible information affords us ways of responding to our environment. In my paper I will address both these ways of using sound in relation to computer games.

Since a game player is responsible for the unfolding of the game, his exploration of the virtual space laid out before him is pertinent. In this mood of exploration sound is important and heavily contributing to the aesthetic of the experience.

Let us start by looking at the opening scene of Half-life 2. Starting the game we arrive at City 17 in a train. The howling voice of the train whistle and the squeaking sound of the worn down train wagons dominate the soundscape, as we arrive at the central station. Leaving the train, the taped voice of Dr. Breen, who looks down on us from a big video screen high on the wall, bids us welcome. Also, the click of the flying surveillance camera taking our picture, and the silent hum of its motors reach our ears. (See Picture below)

From the tracks we walk to the arrival hall. The distant sound of the town traffic can be clearly heard and the echoing voice of Dr. Breen underlines the huge classicistic building. The distorted voices of the masked guards who are there to secure our relocation are standing out and their commands lead the way. Also, the sound of radio communication between the guards and the stereotypic cool voice of a female in the control center regularly fills the air. (See Picture below)

As we pass the check point, the sound of the alarm goes off and a large metal door with squeaking hinge’s opens up. By the harsh command of one of the guards, who later shows himself as an undercover agent for the resistance, we are being ordered into another room. (See Picture below)

Following our friendly guard down a corridor and into the other room, we are still accompanied by the chanting voice of Dr. Breen, but this time without the echo. After revealing his identity, the guard helps us to escape the building out through an open window.

The eye and the ear
It can be argued, that the sound merely mimics the visual landscape and as such does not contribute anything beyond what can already be seen. In the scene just described, we are not surprised by the fact, that the train in which we arrive is worn down or that the train station turns out to be a large building. This we can see for our self. Although the sound is just an element in the overall staging of the set, it does however present a radically different way of experiencing the landscape than by eye. To fully understand the power of sound we need to recognize the difference between the eye and the ear.

With our eyes we see the world in perspective and as such we look only at a fragment of the world surrounding us. Our vision is selective, since we have to look in the direction of that, we wish to see. Also, vision places us in the periphery of our world. From here we look into our environment finding ourselves at a distance from what we see. This distance is a necessary one, since we cannot see or survey things that are to close

Contrary, our ears situate us in the middle of our environment. The sounding space is anthropocentric in nature. Literally speaking, we cannot close our ear the way we can close our eye. Therefore sound, as a consequence of our spherical hearing, informs us about actions and sounding phenomena taking place outside of our visual perspective. In other words, sound enhances the visual space and the form of our ears and the distance between them, enables us to position the sounding object quite precisely in that space. Furthermore the eye cannot see through objects. Since we do not have an x-ray vision, the eye only reaches the surface of things. Sound on the other hand can be heard through solid objects like a wall and it runs around corners due to the way sound waves diffuse. Consequently we are able to hear what is happening behind a closed door.

As formulated by Victor Zuckerkandl “the eye discloses space to me in that it excludes me form it” while the ear “disclose space in that it lets me participate in it” [1, p. 291] Also, we participate in space in that the sound moves us. Not just in the psychological sense but also in the physical sense. Sound grasps the body and shakes it (so to speak). In short sound immerses the listener into the world. It makes the environment come alive.

What do we hear
Adopting an ecological approach to auditory perception we must focus on the gamers relation to the virtual environment (the animal-environment relation) and take and interest in how he adapt to his environment, and what he can know of it when he only listens to it? To understand this is to understand our natural way of listening.

In our natural way of listening sound is indexical. It points to the fact that a given event is taking place. Sound occurs only when materials interact [2]. The interaction is to be understood as a source-cause relation [3] in which a sounding system (the source) resonates as a consequence of a given action (the cause), like when a hammer hits a bell. Also, the sound, as Murray Schafer has pointed out, is the effect of a cause in a certain space, at a certain time. [4] What becomes important to the gamer is to know what produced the sound, where it comes from and whether the event that provoked the sound is of some kind of danger to him. That is, what kind of actions, if any, should he undertake in order to oppose the situation? His ability to answer these questions is partly conditioned by his experience with and knowledge of his environment and partly by his capacity to detect the information of the source-cause interaction contained within the sound itself.

Surveying just a few aspects of the information contained within a sound we must state initially, that sound tell us a lot more than me­­re­ly, that it has been pro­du­ced. Due to the nature of the interacting materials and the nature of the interaction itself, we are able to outline information about the source as well as the cause.

Taking the sound of the closing door, as we follow our friendly guard into the room next door, as our example the resonant features of the door, which are traceable in both the dynamic and spectral envelope, informs the gamer about the characteristic of its material. Is it made of metal or wood; is it solid or fragile; is it massive or hollow? Also the size of the door is manifest in the sound. So is the action by which it closes. Does it close silently, in a normal way or is it slammed behind us. Knowing that the sound heard emanates from a large metal door is not in it self important information in an ecological context. What counts is what affordances this information allows the gamer to do. A sound of a door normally points to the fact that someone is entering or leaving the room, but the sound also informs him of a way out.

The sound of the space
Every sound inform about characteristic features of the source-cause interaction from which it emanates. Not necessarily on a level where we unambiguously are able to confirm the source, but often to the point where we can distinguish between human or non-human made sound, or confirm whether the resonant object is solid or fragile, hard or soft, large or small, hollow or massive etc.

But the sound not alone informs about its source-cause relation. It also informs us about the space in which it takes place. Drawing from the theory of soundscape studies I wish to outline different dimensions of space constituted by sound. The first is the “architectural space” that the gamer detects from the nature of the acoustics. The second is the “relational space” that the gamer experience by the different locations and movements of the sound sources and/or by the movements of the avatar. The third is the “space as place”, that is the space as a site-specific place. In this third dimension we can say that sound plays an important role in the staging of the “genius loci”.

Architectural space
By architectural space I refer to the space as a quantitative and mea­­surable phenomenon like an indoor environment. In a room sound is reflected from the surfaces of the walls. The time passed between the direct and the reflected signal reaches our ears indicate the size of the room. When sound diffuse it causes a sound field and the complexity of the sound field depends on the number of reflections and the amount of energy in the re­flected signals. If the materials of the surfaces within the room are reflective the sound field created will be a complex one and the room very reverberant. Also the construction of the room will in some cases result in a distinctive acoustic space. Parabolic surfaces concentrate the energy in the focal point and thereby they make up a characteristic acoustic space as in a dome or a tunnel.

Since the architectural dimension of space is exposed only when sound are reflected from the surfaces of its walls, there need to be a continuously or regular sound producing event to reveal that dimension. Often this effect is obtain by the sound of dripping or running water (frequently used in cave-like environments) or the sound of footsteps or people talking, coughing, yelling, whist­ling etc.

The space of the central station in which we arrive in Half-life 2 is a reverberant one and the scale of the building is continuously witnessed in the echoing voice of Dr. Breen.

Relational space
The relational space is the space indicated by the distance and position of sound sources surrounding the listener in a given moment.

Since we have two ears placed on each side of our head, we can detect even small variations in the position of a given sound source. Unless the sound is coming directly in front of us or behind us, the time delay between the sound reaching the one ear first and afterwards the second indicates the position of the sound source. Also, the filter effect of the head plays a role in establishing the position of the sound source.

The sense of distance is related to the amplitude of the sound; since a sound becomes louder the closer it is to the ear. The experience of distance is also related to the distribution of energy in the spectrum of the sound. High frequencies loose energy faster than low frequencies.

The direction of a moving sound is a function of the changes in the position and distance of the source. When the sounding source is moving towards us we experience an increase in the frequency of the sound as a result of the sound waves accumulating in front of us. When the sound source passes by, the frequency decreases. This phenomenon is called a Doppler shift.

Unlike architectural space the relational space is not an objective quantitative phenomenon that exists independent of the listener. It is a sub­jec­tive space since two people cannot be in the same place at the same time. Therefore, the relational space is unique, it can not be shared. Also, the relational space is dynamic in contrast to the static fea­tur­e of the architectural space, since it changes as the listener moves around the landscape or as the sound sources moves around the listener. This we experience in Half-life 2 in the stereo field of our headphones, when we move the avatar around the building, getting closer to the people talking or when the flying surveillance camera approaches us etc. The result is a constantly change in the figure-background ex­pe­ri­en­ce of the environment.

Space as place
While space is an abstract category, place is site-specific. A place can be identified and situated in a historical and geographical context. As a place the space has genius loci. In Half-life 2 the traffic sound heard from within the train station is site-specific. It signifies urban life just outside the station. The traffic sound heard is not just a silent hum. It is possible to hear specific sound sources like the screaming sound of breaks, the sound of accelerating and decelerating tramcars and car horns. As Michael Chion has pointed out the sound of car horns is one of the most indicative sounds of the modern City [5]. Its aesthetic power rests in the fact, that it both signifies the tempo of modern life and the chaotic feel of dense city traffic. Also, the echoing voice and the complex sound field which I have already spoken of is a stereotype of places like a train station.

To outline the site specific designation of sound we can make use of the analytical concepts of soundscape theory as formulated by Murray Schafer. When analyzing a soundscape Murray Schafer distinguish between keynote sounds, sound marks and sound signals. Also, he categorizes a soundscape as being either a hi-fi or a lo-fi environment. [4]

In a low fidelity (lo-fi) environment the space is overcrowded with sound, either because there are many simultaneous sounding signals or because the space in which the signals sound is very reverberant. The result is masking and a lack of clarity. The high fidelity (hi-fi) environment is the opposite. The sounds can be heard clearly without crowding or masking.

Keynote sounds Schafer defines as “those which are heard by a particular society continuously or frequently enough to form a background against which other sounds are perceived. Examples might be the sound of the sea for a maritime community or the internal sound of the combustion engine in the modern city.” [4, p. 272]

Soundmark a term Schafer derives from landmark refers to sounds that are unique or has a quality that is remarkable within the local environment like church bells or foghorns. Sound signals are any sound “to which the attention is particularly directed. In soundscape studies sound signals are contrasted by keynote sounds, in much the same way as figure and ground are contrasted in visual perception”. [4, p. 275]

The central station of Half-life 2 must be categorized as a lo-fi environment. Using the analytical categories of Murray Schafer the harsh commands of the guards and the speaking voice of Dr. Breen are sound signals directing our attention, while the background city noise becomes the keynote sounds of the landscape. From this background noise, individual soundmarks rise such as the horn sound of cars. Strangely enough the typical soundmark of the train station – the announcing voice of trains arriving and departing is missing. Maybe this has to do with the fact that City 17 is not a place where people are supposed to leave again at free will.

The aesthetic aspect of sound
Until now I have stressed the informative power of sound by asking what we can know of our environment, when we only listen to it and also which actions it affords us to take. This has been an investigation of what sound is able to denotate. However, we should take care in favoring the indexical nature of the sound event. Talking about aesthetics we must equally acknowledge the importance of the connotative aspect of sound. Sound is not only contributing to the aesthetic of experience by making the world more readable. The ambiguous sound that escapes our reading can be equally important in building up the suspense of the game. Knowing that something is happening around the corner, without knowing precisely what it is, is most frightening.

Also, real world sounds evokes memories and provoke images of a strong aesthetic value. This has to do with the connotative power of sound. The silent drip of water in a dark cave brings forward the images of a large, cold and moist Cavern. In Half-life the sound of the door mentioned earlier brings forward images of heavy prison doors, behind which the screams of tortured prisoners cannot be heard.

In the same way the aesthetic aspect of the architectural space lies not only in the fact that it helps us to designate the size of the building or the material of the reflective surfaces. It has to do with the disorientating nature of the lo-fi environment as well. In such a space the interference of signals makes it difficult to locate the exact position and direction of the sound sources and this has a strong emotional impact on the listener. The central station of Half-life 2 is a good example of this, since the reverberant room along with the presumptuous voice of Dr. Breen and the high volume sounds of radio-communication leaves the gamer disorientated and uncomfortable. In here he finds no rest.

The aesthetic aspect of the relational space is likewise not reducible to the measuring of the distance and position of the individual sound sources surrounding us. It has to do with intimacy, since sounds close to us enters our sphere of privacy. Talking to people close enough to hear the sound of the words being articulated - sound of the breath or the click of the lips, is emotional since it implies the nearness of bodies.

Similarly the aesthetic aspect of the ‘space as place’ is not the designation of the place as being an urban environment, but the connotations and images provoked by the sound events taking place. The sound of the alarm is not neutral in the sense that one alarm can be as good as another as long as we are able to recognize it as an alarm. The sound of the alarm at the check point in Half-life 2 is one that we connect to large nuclear plants or other similar constructions. Its cold, grumpy and dry voice underlines the in-humanity of the place.

Sound contributes to the aesthetic of the play, in that the ear offers ways of perceiving the world that the eye cannot. Sound immerses the player into the game and informs the gamer of the source-cause actions taking place, indicating the dimensions and the materiality of the sound source. Detecting this information affords the player ways of interacting with his environment. Especially, sound contributes heavily to the experience of the space since it outlines three dimensions of space: the architectural, the relational and the site specific (space as place).

But sound is not just indicative to the space and the actions taking place. Part of its aesthetic value is reserved a more connotative power. Sound evokes memories and provokes images and brings about strong emotional experiences, also when it does not make the world more readable, but rather ambiguous and opaque.

As shown in the example of Half-life 2, sound underlines the settings of the scene. The central station is shown as a resonant space and the trashy style of the train by which we arrive at City 17 is reflected in the sound. The click from the flying camera reminds us, that we are kept under surveillance. As such the sound is informative about the events taking place and the nature of them. Alongside sound characterizes the overall atmosphere of the environment. The scary sound of the mask guards, the lo-fi soundscape of the reverberant space and the urban life emphasize the unpleasantness of City 17.

References [ back ]
[1] Zuckerkandl, Victor 1956. Sound and Symbol. London: Routledge & Kegan Paul.
[2] Gaver, William W. 1993. What in the World Do We Hear?: An Ecological Approach to Auditory Event Perception. Lawrence Erlbaum Associates. Ecological Psychology, 5(1)
[3] Smalley, Denis 1986. Spectro-morphology and structuring Processes, in Language of Electroacoustic Music, Ed. Simon Emmerson London: MacMillan Press.
[4] Schafer, R. Murray 1994. The Soundscape – the tuning of the world. 2. edition. Rochester,Vermont: Destiny Books.
[5] Chion, Michel 1994. AUDIO-VISION – sound on screen. New York: Columbia University Books.