Perception and Time Scale

Cognitive scientists describe memory as operating on three time scales: echoic, short term, and long term . Echoic memory operates on a very short time scale (up to about a second) where features are extracted from sensory impressions and events are fused together to form individual objects of perception. For example, consider the sound of a stick hitting a drum. It may seem as if the sound is a single object; in reality it is a complex pattern of pressure waves that impinge on the ear. Binding all the necessary impressions into a single entity requires considerable cognitive activity.

Similarly, the puff of air at the start of a flute note becomes bound to the (relatively) steady oscillations that follow. This new entity must interact with long term memory for the listener to "recognize" and name the sound, creating the illusion that it represents a familiar object: the flute sound. Moreover, because the flute generates more-or-less periodic sound waves of a certain kind that we recognize as pitch, the auditory system integrates this information and we "hear" the flute playing C as a single object of perception. Similarly, complex sense impressions such as those that represent phonemes in speech, simultaneous musical intervals, timbres (the sound of the guitar in contrast to that of the flute playing the same note), and the boundaries between such events are aggregated together into coherent auditory events . Fig. 5 shows the approximate time scales at which various cognitive, perceptual, and musical events occur.

Figure 5: Different timescales cause different perceptions of the "same" phenomenon
After the disparate sensory impressions are bound into coherent objects of perception, these objects are themselves grouped together based on similarity or proximity. Short term memory is where patterns such as words, phrases, melodies, and rhythms are gathered into perceptual streams. Long term memory is where larger cognitive structures and conceptual categories are stored; abstract ideas, forms, language, poems, and songs. But long term memory is not a passive receptacle where short term memories retire. Rather, there is a constant interplay between short and long term memories. Whenever an object is present in short term memory, it activates similar objects from within long term storage; these are then recirculated in parallel with the new events.

There are also differences in the perception of events at different time scales that mirror these differences in cognition. For example, if a series of short clicks is played at a rate of 3 per second they are heard as series of short clicks. But if the same clicks are performed at a rate of 100 per second, then they are perceived as a buzzing tone with a definite pitch. Thus "pitch" is the name we give to this perception when it occurs between 20 Hz and 20 KHz, while we call it "rhythm" when the interval between clicks is longer, between about [ 1/10] and 3 s. There is even a different vocabulary to describe the rates of these phenomena: pitch is described as being low or high; rhythm is described as being slow or fast. See Chapter 4 for sound examples and further discussion. At yet longer time intervals, the clicks are heard as disconnected events. Thus rhythmic patterns may be conceived (as in the orderly succession of day and night) or perceived (as in a heartbeat, a dance, or a musical passage).

In between the time scales associated with pitch and those associated with rhythm lies a region (called fluttering in Fig. 5) where sound is perceived in brief bursts. Rainsticks, bell trees, and ratchets, for example, produce sounds that occur faster than rhythm but slower than pitch. Similarly, drum rolls and rapid finger taps are too fast to be rhythmic but too slow to be pitched. Roughly speaking, pitched sounds occur on the same time scale as echoic memory and rhythmic perceptions are coincident with the time scales of short term memory.

Musical usage also reflects the disparity of time scales. The shortest isolated sounds are perceivable as clicks, and may have duration as short as fractions of a ms. These are called "grains" of sound. In order to have a clear sense of pitch, a sound must endure for at least about 100 ms, and this is enough time to evoke impressions of pitch, loudness, and timbre. Such sensations are typically fused together to form sound objects, which are commonly called "notes" if they are played by an instrument or sung by a voice. Groups of notes cluster into phrases, and phrases coalesce into songs, or more generally, into performances that may last up to a few hours.

Finally, Fig. 5 shows the time scales at which various kinds of signal processing occur; from the single sample (which may be from about 5 KHz to 200 KHz for audio), through filters (such as lowpass, bandpass, and highpass), various special effect processing such as flanging and phasing, and the rate at which vibrato and tremolo occur. These signal processing methods occur within the zone of event fusion and so effect the quality of a sound (its timbre, vibrato, spectral width, attack characteristics, etc.). Multi-tap delay line effects extend into the time scale dominated by short term memory and thus can change the perception of rhythmic events.

The above discussion has focused on how the scale of time interacts with our cognitive, perceptual, and musical makeup. A different, though related issue is how we perceive the flow of time. This depends on many factors: the emotional state of the observer, how the attention is directed, familiarity with the events, etc. In addition, the perception of time depends on the nature of the events that fill the time: repetition and a regular pulse help time to pass quickly while irregular noises or unchanging sounds tend to slow the perception of time. Issues of duration and time perception are explored more fully in Chapter 4.

Next section: Illusions of Sound Perception
Previous section: Rhythm, Periodicity, Regularity, Recurrence
Back to What is Rhythm?