Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification

山田, 誠; ヤマダ, マコト; YAMADA, Makoto

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification

https://ir.soken.ac.jp/records/1675

名前 / ファイル	ライセンス	アクション
要旨・審査要旨 (355.6 kB)
本文 (1.5 MB)

アイテムタイプ

学位論文 / Thesis or Dissertation(1)

公開日

2011-01-18

タイトル

Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification

タイトル

Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_46ec

資源タイプ

thesis

著者名

山田, 誠

フリガナ

ヤマダ, マコト

著者

YAMADA, Makoto

学位授与機関

学位授与機関名

総合研究大学院大学

学位名

博士（統計科学）

学位記番号

内容記述タイプ

Other

内容記述

総研大甲第1333号

研究科

値

複合科学研究科

専攻

値

15 統計科学専攻

学位授与年月日

2010-03-24

学位授与年度

値

2009

要旨

内容記述タイプ

Other

内容記述

The speaker identification is one of the key technologies for person identification in humanoid robots. Especially, when the face information is not available, the speaker identification is the only way to identify person, thus, to improve the speaker identi- fication performance is an important issue for person identification tasks. 　There are four major issues in speaker identification for humanoid robots in prac- tice. First, the humanoid robots should identify the speaker in real-time with high identification rates. In these days, the kernel methods such as the support vector machine (SVM) and kernel logistic regression (KLR) are popular for speaker identifi- cation tasks, and the kernel based systems outperform the conventional Gaussian Mixture Model (GMM) based system. However, the kernel based speaker iden- tification systems are usually computationally intensive, and this is of course not preferable for real-time implementation. To deal with the computational issue, we propose a method of approximating the sequence kernel that is shown to be compu- tationally very efficient in Chapter 3. More specifically, we formulate the problem of approximating the sequence kernel as the problem of obtaining a pre-image in a reproducing kernel Hilbert space. The effectiveness of the proposed approximation is demonstrated in text-independent speaker identification experiments with 10 male speakers?our approach provides significant reduction in computation time while per- formance degradation is kept moderately. Based on the proposed method, we develop a real-time kernel-based speaker identification system using the Virtual Studio Tech- nology (VST). 　Second, the speech features vary over time due to session dependent variation, the recording environment change, and physical conditions/emotions. However, con- ventional kernel based systems implicitly ignore these facts, and they just simply assume that the training and test input probability distributions of the training and test datasets are same at any time. To alleviate the influence of session dependent variation, it is popular to use several sessions of speaker utterance samples or to use cepstral mean normalization (CMN). However, gathering several sessions of speaker utterance data and assigning the speaker ID to the collected data are expensive both in time and cost and therefore not realistic in practice. Moreover, it is not possi- ble to perfectly remove the session dependent variation by CMN alone. Thus, in Chapter 4, we propose a novel semi-supervised speaker identification method that can alleviate the influence of non-stationarity such as session dependent variation, the recording environment change, and physical conditions / emotions. We assume that the voice quality variants follow the covariate shift model, where only the voice feature distribution changes in the training and test phases. Our method consists of weighted versions of kernel logistic regraession and cross validation and is theoretically shown to have the capability of alleviating the influence of covariate shift, where the weight (a.k.a importance) is estimated from the training and test distribution using the Kullback-Leibler Importance Estimation Procedure (KLIEP). We experimentally show through text-independent / dependent speaker identification simulations that the proposed method is promising in dealing with variations in voice quality. 　Third, the humanoid robots are desired to automatically detect the unknown speaker and add the unknown speaker into the dictionary. Thus, the speaker detec- tion task can be formulated as the outlier detection problem (i.e., outliers can be the unknown speakers). Since the outlier detection problem can be solved through the comparison between the log likelihoods of the unknown speaker and the speakers, the estimation accuracy of the log likelihoods is an important issue to improve the speaker detection performance. Thus, in chapter 5, we propose a new importance (a.k.a likelihood) estimation method using Gaussian mixture models (GMMs) and principal component analyzers (PPCAs) mixture, where the proposed approach esti- mates the importance without going through the density estimation. An advantage of the proposed methods is that covariance matrices or projection matrices can also be learned through an expectation-maximization procedure, so the proposed method ex- pected to work well when the true importance function has high correlation. Through experiments of outlier detection, we show the validity of the proposed approaches. 　Forth, the humanoid. robots move throughout the world, and the surrounding environment, source positions, and source mixtures are constantly changing. In ad- dition, the speech overlaps are frequently occurred during conversation. Thus, the source separation techniques are useful for improving the speaker identification per- formance. To deal with those problems, in Chapter 6, we consider the problem of two-source signal separation from a two-microphone array, where a point source such as a speech signal is placed in front of a two-microphone array, while no information is available about another interference signal. We propose a simple and computation- ally efficient method. for estimating the geometry and source type (a point or diffuse) of the interference signal, which allows us to adaptively choose a suitable unmixing matrix initialization scheme. Our propose method, noise adaptive optimization of matrix initialization(NAOMI), is shown to be effective through source separation and speaker identification simulations.

所蔵

値

有

フォーマット

内容記述タイプ

Other

内容記述

application/pdf

戻る

views

See details

	Views

Versions

Ver.1

2023-06-20 15:57:18.956498

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification

× 山田, 誠

× ヤマダ, マコト

× YAMADA, Makoto

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification

× 山田, 誠

× ヤマダ, マコト

× YAMADA, Makoto

Versions

Share

Cite as

Other

エクスポート

コミュニティ