Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
Abstract
This paper introduces a novel method for speaker verification using Convolutional Neural Networks (CNNs). Unlike traditional approaches that rely solely on spectrogram and waveform images, the proposed method, termed 'DeepConvVectors', dynamically captures speaker-specific features from speech signals. By transforming segments of speech into specialized CNN filters, Deep-ConvVectors were created, which encapsulate essential speaker characteristics. The experiments carried out on the THUYG-20 SRE dataset demonstrated the superior performance of the proposed method in comparison with the established methods, with an average Equal Error Rate (EER) of just 0.99%. This approach offers a dynamic solution for precise speaker identification, showcasing the transformative potential of CNNs in the context of ASV.