Documentation for matlab_speech_features

This is documentation for matlab_speech_features, a library for speech feature extraction. Code is available at https://github.com/jameslyons/matlab_speech_features. If you find any errors, feel free to make a pull request or leave a comment at the bottom of the page.

Download matlab_speech_features.zip

msf_mfcc - Mel Frequency Cepstral Coefficients §

 function feat = msf_mfcc(speech,fs,varargin)

given a speech signal, splits it into frames and computes Mel frequency cepstral coefficients for each frame. For a tutorial on MFCCs, see MFCC tutorial.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'nfilt' - the number filterbanks to use. Default: 26
'lowfreq' - the lowest filterbank edge. In Hz. Default: 0
'highfreq' - the highest filterbank edge. In Hz. Default: fs/2
'nfft' - the FFT size to use. Default: 512
'ncep' - the number of cepstral coeffients to use. Default: 13
'liftercoeff' - liftering coefficient, 0 is no lifter. Default: 22
'appendenergy' - if true, replaces 0th cep coeff with log of total frame energy. Default: true

Example usage:

 mfccs = msf_mfcc(signal,16000,'nfilt',40,'ncep',12);

msf_lpc - Linear Prediction Coefficients §

 function feat = msf_lpc(speech,fs,varargin)

given a speech signal, splits it into frames and computes Linear Prediction Coefficients for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'order' - the number of coefficients to return. Default: 12

Example usage:

 lpcs = msf_lpc(signal,16000,'order',10);

msf_rc - Reflection Coefficients §

 function feat = msf_rc(speech,fs,varargin)

given a speech signal, splits it into frames and computes Reflection Coefficients for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'order' - the number of coefficients to return. Default: 12

Example usage:

 rcs = msf_rc(signal,16000,'order',10);

msf_logfb - Log Filterbank Energies §

 function feat = msf_logfb(speech,fs,varargin)

given a speech signal, splits it into frames and computes log filterbank energies for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'nfilt' - the number filterbanks to use. Default: 26
'lowfreq' - the lowest filterbank edge. In Hz. Default: 0
'highfreq' - the highest filterbank edge. In Hz. Default: fs/2
'nfft' - the FFT size to use. Default: 512

Example usage:

 logfbs = msf_logfb(signal,16000,'nfilt',40,'ncep',12);

msf_filterbank - return a mel-spaced filterbank §

 function fbank = msf_filterbank(nfilt,fs,lowfreq,highfreq,nfft)

returns a mel-spaced filterbank for use with filterbank energies, mfccs, sscs etc.

nfilt - the number filterbanks to use.
fs - the sample rate of 'speech', integer
lowfreq - the lowest filterbank edge. In Hz.
highfreq - the highest filterbank edge. In Hz.
nfft - the FFT size to use.

Example usage:

 lpcs = msf_filterbank(26,16000,0,16000,512);

msf_lsf - Line Spectral Frequencies §

 function feat = msf_lsf(speech,fs,varargin)

given a speech signal, splits it into frames and computes Line Spectral Frequencies for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'order' - the number of coefficients to return. Default: 12

Example usage:

 lsfs = msf_lsf(signal,16000,'order',10);

msf_lpcc - Log Area Ratios §

 function feat = msf_lar(speech,fs,varargin)

given a speech signal, splits it into frames and computes Log Area Ratios for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'order' - the number of coefficients to return. Default: 12

Example usage:

 lars = msf_lar(signal,16000,'order',10);

msf_framesig - break a signal into frames §

 function win_frames = msf_framesig(signal, frame_len, frame_step, winfunc)

Takes a 1 by N signal, and breaks it up into frames. Each frame starts frame_step samples after the start of the previous frame. Each frame is windowed by wintype.

- to specify window, use e.g. @hamming, @(x)chebwin(x,30), @(x)ones(x,1), etc.

signal - the input signal, vector of audio samples
frame_len - length of window in samples.
frame_step - step between successive windows in seconds. In samples.
winfunc - A function to be applied to each window.

Example usage with hamming window:

 frames = msf_framesig(speech, winlen*fs, winstep*fs, @(x)hamming(x));

msf_lpcc - Linear Prediction Cepstral Coefficients §

 function feat = msf_lpcc(speech,fs,varargin)

given a speech signal, splits it into frames and computes Linear Prediction Cepstral Coefficients for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'order' - the number of coefficients to return. Default: 12

Example usage:

 lpccs = msf_lpcc(signal,16000,'order',10);

msf_ssc - Spectral Subband Centroids §

 function feat = msf_ssc(speech,fs,varargin)

given a speech signal, splits it into frames and computes Spectral Subband Centroids for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'nfilt' - the number filterbanks to use. Default: 26
'lowfreq' - the lowest filterbank edge. In Hz. Default: 0
'highfreq' - the highest filterbank edge. In Hz. Default: fs/2
'nfft' - the FFT size to use. Default: 512

Example usage:

 sscs = msf_ssc(signal,16000,'nfilt',40,'ncep',12);

msf_powspec - Compute power spectrum of audio frames §

 function pspec = msf_powspec(speech,fs,varargin)

given a speech signal, splits it into frames and computes the power spectrum for each frame.

speech - the input speech signal, vector of speech samples
fs - the sample rate of 'speech', integer

optional arguments supported include the following 'name', value pairs from the 3rd argument on:

'winlen' - length of window in seconds. Default: 0.025 (25 milliseconds)
'winstep' - step between successive windows in seconds. Default: 0.01 (10 milliseconds)
'nfft' - the FFT size to use. Default: 512

Example usage:

 lpcs = msf_powspec(signal,16000,'winlen',0.5);

comments powered by Disqus

msf_mfcc - Mel Frequency Cepstral Coefficients
msf_lpc - Linear Prediction Coefficients
msf_rc - Reflection Coefficients
msf_logfb - Log Filterbank Energies
msf_filterbank - return a mel-spaced filterbank
msf_lsf - Line Spectral Frequencies
msf_lpcc - Log Area Ratios
msf_framesig - break a signal into frames
msf_lpcc - Linear Prediction Cepstral Coefficients
msf_ssc - Spectral Subband Centroids
msf_powspec - Compute power spectrum of audio frames

Crypto

Documentation for matlab_speech_features

msf_mfcc - Mel Frequency Cepstral Coefficients §

msf_lpc - Linear Prediction Coefficients §

msf_rc - Reflection Coefficients §

msf_logfb - Log Filterbank Energies §

msf_filterbank - return a mel-spaced filterbank §

msf_lsf - Line Spectral Frequencies §

msf_lpcc - Log Area Ratios §

msf_framesig - break a signal into frames §

msf_lpcc - Linear Prediction Cepstral Coefficients §

msf_ssc - Spectral Subband Centroids §

msf_powspec - Compute power spectrum of audio frames §

Contents

Copyright & Usage

Questions/Feedback