ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 020 学位論文
  2. 複合科学研究科
  3. 15 統計科学専攻

Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification

https://ir.soken.ac.jp/records/1675
https://ir.soken.ac.jp/records/1675
102abbf8-eb9c-44b6-b64b-c232fdd2f367
名前 / ファイル ライセンス アクション
甲1333_要旨.pdf 要旨・審査要旨 (355.6 kB)
甲1333_本文.pdf 本文 (1.5 MB)
Item type 学位論文 / Thesis or Dissertation(1)
公開日 2011-01-18
タイトル
タイトル Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification
タイトル
タイトル Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification
言語 en
言語
言語 eng
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_46ec
資源タイプ thesis
著者名 山田, 誠

× 山田, 誠

山田, 誠

Search repository
フリガナ ヤマダ, マコト

× ヤマダ, マコト

ヤマダ, マコト

Search repository
著者 YAMADA, Makoto

× YAMADA, Makoto

en YAMADA, Makoto

Search repository
学位授与機関
学位授与機関名 総合研究大学院大学
学位名
学位名 博士(統計科学)
学位記番号
内容記述タイプ Other
内容記述 総研大甲第1333号
研究科
値 複合科学研究科
専攻
値 15 統計科学専攻
学位授与年月日
学位授与年月日 2010-03-24
学位授与年度
値 2009
要旨
内容記述タイプ Other
内容記述 The speaker identification is one of the key technologies for person identification in<br />humanoid robots. Especially, when the face information is not available, the speaker<br />identification is the only way to identify person, thus, to improve the speaker identi-<br />fication performance is an important issue for person identification tasks.<br /> There are four major issues in speaker identification for humanoid robots in prac-<br />tice. First, the humanoid robots should identify the speaker in real-time with high<br />identification rates. In these days, the kernel methods such as the support vector<br />machine (SVM) and kernel logistic regression (KLR) are popular for speaker identifi-<br />cation tasks, and the kernel based systems outperform the conventional Gaussian<br />Mixture Model (GMM) based system. However, the kernel based speaker iden-<br />tification systems are usually computationally intensive, and this is of course not<br />preferable for real-time implementation. To deal with the computational issue, we<br />propose a method of approximating the sequence kernel that is shown to be compu-<br />tationally very efficient in Chapter 3. More specifically, we formulate the problem<br />of approximating the sequence kernel as the problem of obtaining a <i>pre-image</i> in<br />a reproducing kernel Hilbert space. The effectiveness of the proposed approximation<br />is demonstrated in text-independent speaker identification experiments with 10 male<br />speakers?our approach provides significant reduction in computation time while per-<br />formance degradation is kept moderately. Based on the proposed method, we develop<br />a real-time kernel-based speaker identification system using the Virtual Studio Tech-<br />nology (VST).<br /> Second, the speech features vary over time due to session dependent variation,<br />the recording environment change, and physical conditions/emotions. However, con-<br />ventional kernel based systems implicitly ignore these facts, and they just simply<br />assume that the training and test input probability distributions of the training and<br />test datasets are same at any time. To alleviate the influence of session dependent<br />variation, it is popular to use several sessions of speaker utterance samples or to use<br /><i>cepstral mean normalization</i> (CMN). However, gathering several sessions of speaker<br />utterance data and assigning the speaker ID to the collected data are expensive both<br />in time and cost and therefore not realistic in practice. Moreover, it is not possi-<br />ble to perfectly remove the session dependent variation by CMN alone. Thus, in<br />Chapter 4, we propose a novel semi-supervised speaker identification method that<br />can alleviate the influence of non-stationarity such as session dependent variation,<br />the recording environment change, and physical conditions / emotions. We assume<br />that the voice quality variants follow the <i>covariate shift</i> model, where only the voice<br />feature distribution changes in the training and test phases. Our method consists of<br />weighted versions of kernel logistic regraession and cross validation and is theoretically<br />shown to have the capability of alleviating the influence of covariate shift, where the<br />weight (a.k.a importance) is estimated from the training and test distribution using<br />the Kullback-Leibler Importance Estimation Procedure (KLIEP). We experimentally<br />show through text-independent / dependent speaker identification simulations that the<br />proposed method is promising in dealing with variations in voice quality.<br /> Third, the humanoid robots are desired to automatically detect the unknown<br />speaker and add the unknown speaker into the dictionary. Thus, the speaker detec-<br />tion task can be formulated as the outlier detection problem (i.e., outliers can be the<br />unknown speakers). Since the outlier detection problem can be solved through the<br />comparison between the log likelihoods of the unknown speaker and the speakers,<br />the estimation accuracy of the log likelihoods is an important issue to improve the<br />speaker detection performance. Thus, in chapter 5, we propose a new importance<br />(a.k.a likelihood) estimation method using Gaussian mixture models (GMMs) and<br />principal component analyzers (PPCAs) mixture, where the proposed approach esti-<br />mates the importance without going through the density estimation. An advantage of<br />the proposed methods is that covariance matrices or projection matrices can also be<br />learned through an expectation-maximization procedure, so the proposed method ex-<br />pected to work well when the true importance function has high correlation. Through<br />experiments of outlier detection, we show the validity of the proposed approaches.<br /> Forth, the humanoid. robots move throughout the world, and the surrounding<br />environment, source positions, and source mixtures are constantly changing. In ad-<br />dition, the speech overlaps are frequently occurred during conversation. Thus, the<br />source separation techniques are useful for improving the speaker identification per-<br />formance. To deal with those problems, in Chapter 6, we consider the problem of<br />two-source signal separation from a two-microphone array, where a point source such<br />as a speech signal is placed in front of a two-microphone array, while no information<br />is available about another <i>interference</i> signal. We propose a simple and computation-<br />ally efficient method. for estimating the geometry and source type (a point or diffuse)<br />of the interference signal, which allows us to adaptively choose a suitable unmixing<br />matrix initialization scheme. Our propose method, <i>noise adaptive optimization of<br />matrix initialization</i>(NAOMI), is shown to be effective through source separation<br />and speaker identification simulations.
所蔵
値 有
フォーマット
内容記述タイプ Other
内容記述 application/pdf
戻る
0
views
See details
Views

Versions

Ver.1 2023-06-20 15:57:18.956498
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3