SPEAKER IDENTIFICATION SYSTEMSUMMARY FOR INDIVIDUAL CHAPTERS: CHAPTER 1 - INTRODUCTION: In this chapter the concepts of speaker recognition and identification are introduced. Initial discussion highlights the uniqueness, applications and advantages of speaker recognition and speaker identification method of authentication. This method is compared with the other methods of authentication. The concepts related to speech processing, speech recognition and speaker recognition are also introduced. The topic of speech processing details about the characteristics of the speech signal like the different qualities of speech, the variations in the acoustic properties of speech signal and the conversion stages in transforming the human speech signal into the digital speech signal.
This digital speech is of main interest for the speaker authentication system. A spectrogram of speech signal is shown to characterize the signal energy in it. Speech recognition is described with a block diagram which details the stages of speech processing, word detector, pattern matching, etc. Later the basic classification of speaker recognition systems is described with a diagram. A clear demarcation between the speaker identification and speaker verification systems is described. These basic concepts lay the required foundation for the discussions to follow in later chapters. CHAPTER 2- SPEAKER IDENTIFICATION: This chapter deals with the concept of speaker identification in detail.
Initially the human speech production mechanism is described. Next, the types of speaker identification systems are explained based on the dependency on the text information. The two main types discussed are text independent speaker identification and text dependent speaker identification. Later the speaker identification system is described in detail with a block diagram after a short discussion about speech models. The various stages like pre-emphasis filtering, analog to digital conversion, frame blocking mechanisms, windowing techniques and auto correlation analysis are discussed in detail.
The pre-emphasis filtering is dealt for both the cases of frame by frame speech signal sequence and the entire speech signal. The analog to digital converters which are used in practice are highlighted with their specifications. The detailed theory behind frame blocking is explained with the aid of a diagram. The characteristics and performance parameters of various windowing methods are shown. Finally the use of auto correlation analysis for extracting the harmonic and formant properties from speech signal is emphasized.
CHAPTER 3 – FEATURE EXTRACTION: The above discussion about feature extraction describes the methods of selecting and estimating the appropriate features in the speech signal using best possible methods. Methods like Linear prediction coefficients (LPC), Linear prediction cepstral coefficients (LPCC), Mel filter bank cepstral coefficients (MFCC), Bark filter bank cepstral coefficients (BFCC) and Uniform filter bank cepstral coefficients (UFCC) are dealt in detailed. It is shown that the Linear prediction coefficients give information about formant frequency and bandwidth of the speech signal.
Nonetheless, a more suitable alternative for LPC is LPCC. The cepstral coefficients other than the zeroth coefficient represent the features of the speech signal. In Mel filter bank, cepstral coefficients are calculated on the mel scale using triangular filters. This frequency mapping has been dealt in this chapter and the cepstral coefficients are computed according to the given equations. The chapter also outlines the advantages of MFCC in application to GMMs and speaker identification systems. Another feature extraction method named BFCC, is also discussed whose performance is similar to MFCC.
Finally the UFCC discussed has lower performance than MFCC and BFCC. It is still suitable for speaker identification because it gives uniform resolution at all frequencies. Thus this chapter gives a wide idea about the various methods of speech feature extraction.