Modulation Spectra: A New Tool for Analysis and Coding of Speech and Music

Abstract

Fundamental principles in electrical engineering are linear time-invariant systems and the Fourier transform. These principles allow frequency components to be separately and independently represented and modified. But the world is not linear and time-invariant. The production of speech and music are two profoundly time-variant cases. The time-varying dynamics of speech and music are what give these signals almost all of their character and information content. While the statistical behavior of these dynamics has been modeled by hidden Markov and similar models, there is no past Fourier- or wavelet- like transform approach for representation and modification of the dynamics. We thus have developed such a theory from an engineering perspective and, using terminology from speech research, call the key result of this theory the “modulation spectrum.” A modulation spectrum decomposes a signal into a two-dimensional representation of standard frequency versus frequency of the dynamics. There is strong evidence that this two-dimensional representation matches aspects of human auditory perception. Since this spectrum is invertible, even after modification, it can also be used to filter or modify signal dynamics. It also maps the most important dynamics into a small number of transform coefficients and is thus potentially useful for efficient coding. Modulation spectra could be applicable to a wide range of problems in acoustics, speech separation and enhancement for coding and hearing aids, image coding and modification, and video. Our work has so far focused on speech and music signals. For example, some of our demonstrations will show how, via filtering of the dynamics, we can change the sound of a piano into the sound of a pleasant and non-synthetic sounding new instrument. We also will show how modulation spectra can beneficially replace the conventional filterbanks used in audio coding. Speech separation and modification examples will also be demonstrated.

Biography

Les Atlas received a Ph.D. in electrical engineering from Stanford University in 1984. He joined the University of Washington in 1984, where he is a Professor of Electrical Engineering. His research is in digital signal processing, with specializations in acoustic analysis, time-frequency representations, and signal recognition and coding. His research is supported by DARPA, the Office of Naval Research, the Army Research Lab, and the Washington Research Foundation. Dr. Atlas received a National Science Foundation Presidential Young Investigator Award and has received a Fulbright Research Award for the 2003-2004 academic year. He was General Chair of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Chair of the IEEE Signal Processing Society Technical Committee on Theory and Methods, and a member of the Signal Processing Society’s Board of Governors.

Les Atlas

University of Washington Electrical Engineering

EEB 125

15 May 2003, 10:30am until 12:00am