Cross-Corpus Speech Emotion Recognition Using HuBERT Model, Speaker Embeddings, and Prosodic Features

Nasersharif, Babak; Naderi, Navid

doi:10.22034/tjee.2024.61783.4850

Cross-Corpus Speech Emotion Recognition Using HuBERT Model, Speaker Embeddings, and Prosodic Features

Document Type : Original Article

Authors

babak Nasersharif ¹
Navid Naderi ²

¹ computer engineering department, K.N.Toosi university of technology

² دانشکده مهندسی کامپیوتر - دانشگاه صنعتی خواجه‌نصیرالدین طوسی

10.22034/tjee.2024.61783.4850

Abstract

This study investigates the challenges and methodologies in cross-corpus speech emotion recognition (CCSER), focusing on the generalization of speech features across diverse linguistic, speakers, and emotional contexts. We propose a novel SER system that leverages the transformer blocks of the HuBERT model combined with speaker embeddings and prosodic features to enhance feature extraction for emotion classification across different datasets. Our approach addresses dataset variability by utilizing transfer learning techniques, particularly through unsupervised methods that adapt feature distributions without requiring labeled data from target domains. Specifically, our transfer learning strategy employs a clustering method to select the most appropriate trained model for performing transfer learning from the source to target domains. We evaluate our proposed model using several datasets, including IEMOCAP as the source domain, and extend our validation to emotional datasets with different languages, demonstrating the adaptability of our system. The results indicate significant improvements in emotion recognition accuracy compared to traditional methods, highlighting the effectiveness of integrating advanced self-supervised learning models and transfer learning strategies in CCSER tasks.

Keywords

TABRIZ JOURNAL OF ELECTRICAL ENGINEERING

Articles in Press, Corrected Proof
Available Online from 07 September 2024

Article View: 93

Cross-Corpus Speech Emotion Recognition Using HuBERT Model, Speaker Embeddings, and Prosodic Features

Articles in Press, Corrected Proof
Available Online from 07 September 2024

Files

Share

How to cite

Statistics

Cross-Corpus Speech Emotion Recognition Using HuBERT Model, Speaker Embeddings, and Prosodic Features

Articles in Press, Corrected Proof Available Online from 07 September 2024

Files

Share

How to cite

Statistics

Articles in Press, Corrected Proof
Available Online from 07 September 2024