King Saud UniversityKSU Libraries Libraries Catalog

Author(s) Abdullah I. Al-Shoshan
Affiliation Department of Computer Science, College of Computer, Qassim University, Saudi Arabia shoshan@myway.com
Title Speech and Music Classification and Separation: A Review
Source Journal of King Saud University. Engineering Sciences. Volume 19, No 1. (2007/1427)
Abstract The classification and separation of speech and music signals have attracted attention by many researchers. The purpose of the classification process is needed to build two different libraries: speech library and music library, from a stream of sounds. However, the separation process is needed in a cocktail-party problem to separate speech from music and remove the undesired one. In this paper, a review of the existing classification and separation algorithms is presented and discussed. The classification algorithms will be divided into three categories: time-domain, frequency-domain, and time-frequency domain approaches. The time-domain approaches used in literature are: the zero-crossing rate (ZCR), the short-time energy (STE), the ZCR and the STE with positive derivative, with some of their modified versions, the variance of the roll-off, and the neural networks. The frequency-domain approaches are mainly based on: spectral centroid, variance of the spectral centroid, spectral flux, variance of the spectral flux, roll-off of the spectrum, cepstral residual, and the delta pitch. The time-frequency domain approaches have not been yet tested thoroughly in literature; so, the spectrogram and the evolutionary spectrum will be introduced. Also, some new algorithms dealing with music and speech separation and segregation processes will be presented.