Beat Tracking

One interesting aspect of musical rhythm is the "beat," the steady foot-tapping pulse that drives music forward and provides the temporal framework on which the composition rests. Is the beat directly present in the signal, is it a perceptual construct, or does it require high level cognitive processing?

Though it may be tempting to imagine that the beat really exists in the musical signal itself (because it is so conspicuous in our conception), it does not. For example, there may be a syncopated section where all the energy occurs "off" the beat. Or a song may momentarily stop and yet the beat continues even in the absence of sound. Something that can exist without sound cannot be in the signal!

Fig. 9 shows a number of the physical and perceptual terms associated with musical rhythm. The waveform depicts a bit more than five seconds of Scott Joplin's Maple Leaf Rag. The beats are shown above, aligned with the waveform. While several beats are clearly visible in the waveform (the final three, for instance, involve obvious amplitude fluctuations), many are not. The stretch between 30.5 sec and 33.5 sec is devoid of obvious amplitude changes, yet the beat goes on.

Figure 9: A few seconds of Joplin's Maple Leaf Rag is used to illustrate a number of the terms associated with rhythm. The waveform is a direct representation of the physical pressure wave from which the feature vector is derived. Perceptual terms include the tatum ("temporal atom"), beat (or tactus), beat interval, and tempo. Cognitive terms include measures, time signatures, and musical notations which correlate with (but are distinct from) their perceptual counterparts (e.g., the tatum corresponds to the sixteenth note while the beat corresponds to the eighth note). Perceived pulses typically align with the tatum (and/or beat) though they need not in all circumstances.
Comparing the waveform to the line of dots that represent the beat shows why it can be difficult to recover the beat directly from the waveform. Feature vectors may be helpful as an intermediate step; they are derived from waveforms but are designed to emphasize significant features of the sound. For example, the feature vector shown in Fig. 9 was constructed by calculating the short term spectrum and measuring the change in the spectrum from one "instant" to the next. Large values (in either the positive or negative directions) indicate large changes. To the extent that the ear is sensitive to such spectral changes, the large values of the feature vector correspond to perceptible pulses. Clearly, choosing good feature vectors is an important but tricky business. This is discussed at great length in Chapter 4.

When such pulses occur at regular intervals, they tend to induce a perception of repetitiveness. For instance, the string of six pulses beginning at 28.5 sec coincides with six successive timepoints. In this case, the pulses occur at a rate faster than it is comfortable to tap the foot (i.e., faster than the beat), and the grid of approximately equally spaced timepoints that is aligned with the pulses is called the tatum (the regular repetition of the most rapid temporal unit in the piece). Thus the beat and the tatum are similar; they are both regular, repetitive perceptions of a steady flow. They exist because of and persist despite the moment by moment sound of the piece which may or may not reinforce the regularity over any given duration. The difference is that the tatum is always the fastest such regular grid while the beat may be slower. Typical beat intervals are between about 300 and 700 ms.

The tatum is also typically the rate at which the fastest notes in a musical score are written; in this case, sixteenth notes. Because there are two tatum timepoints for each beat, the beat is therefore represented in the score by the eighth notes, and the measure by a 2/4 time signature. These latter notions, involving the musical score, are clearly higher level cognitive constructions and such notations are discussed in Chapter 2. Successful beat tracking of the Maple Leaf Rag is demonstrated in the Maple Tap Rag. Similarly, successful beat tracking of Soul is demonstrated in Soul Tap.

Next section: Why Study Rhythm?
Previous section: Illusions of Sound Perception
Back to What is Rhythm?