For the purpose of silence removal of captured sound, we used the algorithm specified in
"A New Silence Removal and Endpoint Detection Algorithm for Speech and Speaker Recognition Applications"
Our actual system was in JAVA but we verified the performance of this algorithm in MATLAB.
Inputs and Output
Before silence removal |
After automatic silence removal |
It first records sound for 5 seconds and removes the silence and then plays back.
//silence removal and end point detection
THRESHOLD=0.3; // adjust value yourself
TIME=5;
%capture;
Fs = 11025;
y = wavrecord(TIME*Fs,Fs);
%plot(y)
%wavplay(y,Fs);
samplePerFrame=floor(Fs/100);
bgSampleCount=floor(Fs/5); %according to formula, 1600 sample needed for 8 khz
%----------
%calculation of mean and std
bgSample=[];
for i=1:1:bgSampleCount
bgSample=[bgSample y(i)];
end
meanVal=mean(bgSample);
sDev=std(bgSample);
%----------
%identify voiced or not for each value
for i=1:1:length(y)
if(abs(y(i)-meanVal)/sDev > THRESHOLD)
voiced(i)=1;
else
voiced(i)=0;
end
end
% identify voiced or not for each frame
%discard insufficient samples of last frame
usefulSamples=length(y)-mod(length(y),samplePerFrame);
frameCount=usefulSamples/samplePerFrame;
voicedFrameCount=0;
for i=1:1:frameCount
cVoiced=0;
cUnVoiced=0;
for j=i*samplePerFrame-samplePerFrame+1:1:(i*samplePerFrame)
if(voiced(j)==1)
cVoiced=(cVoiced+1);
else
cUnVoiced=cUnVoiced+1;
end
end
%mark frame for voiced/unvoiced
if(cVoiced>cUnVoiced)
voicedFrameCount=voicedFrameCount+1;
voicedUnvoiced(i)=1;
else
voicedUnvoiced(i)=0;
end
end
silenceRemovedSignal=[];
%-----
for i=1:1:frameCount
if(voicedUnvoiced(i)==1)
for j=i*samplePerFrame-samplePerFrame+1:1:(i*samplePerFrame)
silenceRemovedSignal= [silenceRemovedSignal y(j)];
end
end
end
%---display plot and play both sounds
figure; plot(y);
figure; plot(silenceRemovedSignal);
%%%play
wavplay(y,Fs);
wavplay(silenceRemovedSignal,Fs);
NOTE: Don't forget to adjust the microphone level and sound boost feature to achieve good results.
thank u so much
ReplyDeleteWow.. Excellent.. Can u help me compare the efficiency of this algorithm with the STE and ZCR methods?? Thank you
ReplyDelete@Lakshmi,
ReplyDeleteAs shown in research paper, the comparision of efficiency is as follows :
Phrases______STE______ZCR-STE__Proposed_Method
Combination
lock number__77.9531% 70.3720% 83.5565%
Running Text_50.8391% 50.1231% 59.7181%
good work dear,thanks for sharing
ReplyDeleteHi Ganesh,
ReplyDeleteThank you so much for sharing this code. It is brilliant and yet simple to understand. Do you mind to give me the paper entitled 'A New Silence Removal and Endpoint Detection Algorithm for Speech and Speaker Recognition Applications' since I dont have access to that book. My email address yusnita082@gmail.com but why is it your name is not one of the author?
what about remove silence from first and end of speech?
ReplyDeleteplz take a look at the code and the plot ... this algorithm removes silence from every part of speech.
DeleteThis comment has been removed by the author.
ReplyDeleteit doesn't work, i dont know why?
ReplyDeleteI`m trying to import a wave but the Threshold is diferent for each other. How I can ajust the Threshold acordly to the wav.
ReplyDeleteAnd the line 10 of your code the "bgSampleCount=floor(Fs/5); %according to formula, 1600 sample needed for 8 khz"
Where did you find 5 ? its because of Time = 5 ?
How do I définie silence in MATLAB audio WAV
ReplyDelete