{"created":"2023-06-20T13:21:27.609803+00:00","id":1675,"links":{},"metadata":{"_buckets":{"deposit":"34d0e35d-b45c-41b5-83b5-cc124a651db9"},"_deposit":{"created_by":21,"id":"1675","owners":[21],"pid":{"revision_id":0,"type":"depid","value":"1675"},"status":"published"},"_oai":{"id":"oai:ir.soken.ac.jp:00001675","sets":["2:429:17"]},"author_link":["0","0","0"],"item_1_creator_2":{"attribute_name":"著者名","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"山田, 誠"}],"nameIdentifiers":[{}]}]},"item_1_creator_3":{"attribute_name":"フリガナ","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"ヤマダ, マコト"}],"nameIdentifiers":[{}]}]},"item_1_date_granted_11":{"attribute_name":"学位授与年月日","attribute_value_mlt":[{"subitem_dategranted":"2010-03-24"}]},"item_1_degree_grantor_5":{"attribute_name":"学位授与機関","attribute_value_mlt":[{"subitem_degreegrantor":[{"subitem_degreegrantor_name":"総合研究大学院大学"}]}]},"item_1_degree_name_6":{"attribute_name":"学位名","attribute_value_mlt":[{"subitem_degreename":"博士(統計科学)"}]},"item_1_description_12":{"attribute_name":"要旨","attribute_value_mlt":[{"subitem_description":"The speaker identification is one of the key technologies for person identification in
humanoid robots. Especially, when the face information is not available, the speaker
identification is the only way to identify person, thus, to improve the speaker identi-
fication performance is an important issue for person identification tasks.
 There are four major issues in speaker identification for humanoid robots in prac-
tice. First, the humanoid robots should identify the speaker in real-time with high
identification rates. In these days, the kernel methods such as the support vector
machine (SVM) and kernel logistic regression (KLR) are popular for speaker identifi-
cation tasks, and the kernel based systems outperform the conventional Gaussian
Mixture Model (GMM) based system. However, the kernel based speaker iden-
tification systems are usually computationally intensive, and this is of course not
preferable for real-time implementation. To deal with the computational issue, we
propose a method of approximating the sequence kernel that is shown to be compu-
tationally very efficient in Chapter 3. More specifically, we formulate the problem
of approximating the sequence kernel as the problem of obtaining a pre-image in
a reproducing kernel Hilbert space. The effectiveness of the proposed approximation
is demonstrated in text-independent speaker identification experiments with 10 male
speakers?our approach provides significant reduction in computation time while per-
formance degradation is kept moderately. Based on the proposed method, we develop
a real-time kernel-based speaker identification system using the Virtual Studio Tech-
nology (VST).
 Second, the speech features vary over time due to session dependent variation,
the recording environment change, and physical conditions/emotions. However, con-
ventional kernel based systems implicitly ignore these facts, and they just simply
assume that the training and test input probability distributions of the training and
test datasets are same at any time. To alleviate the influence of session dependent
variation, it is popular to use several sessions of speaker utterance samples or to use
cepstral mean normalization (CMN). However, gathering several sessions of speaker
utterance data and assigning the speaker ID to the collected data are expensive both
in time and cost and therefore not realistic in practice. Moreover, it is not possi-
ble to perfectly remove the session dependent variation by CMN alone. Thus, in
Chapter 4, we propose a novel semi-supervised speaker identification method that
can alleviate the influence of non-stationarity such as session dependent variation,
the recording environment change, and physical conditions / emotions. We assume
that the voice quality variants follow the covariate shift model, where only the voice
feature distribution changes in the training and test phases. Our method consists of
weighted versions of kernel logistic regraession and cross validation and is theoretically
shown to have the capability of alleviating the influence of covariate shift, where the
weight (a.k.a importance) is estimated from the training and test distribution using
the Kullback-Leibler Importance Estimation Procedure (KLIEP). We experimentally
show through text-independent / dependent speaker identification simulations that the
proposed method is promising in dealing with variations in voice quality.
 Third, the humanoid robots are desired to automatically detect the unknown
speaker and add the unknown speaker into the dictionary. Thus, the speaker detec-
tion task can be formulated as the outlier detection problem (i.e., outliers can be the
unknown speakers). Since the outlier detection problem can be solved through the
comparison between the log likelihoods of the unknown speaker and the speakers,
the estimation accuracy of the log likelihoods is an important issue to improve the
speaker detection performance. Thus, in chapter 5, we propose a new importance
(a.k.a likelihood) estimation method using Gaussian mixture models (GMMs) and
principal component analyzers (PPCAs) mixture, where the proposed approach esti-
mates the importance without going through the density estimation. An advantage of
the proposed methods is that covariance matrices or projection matrices can also be
learned through an expectation-maximization procedure, so the proposed method ex-
pected to work well when the true importance function has high correlation. Through
experiments of outlier detection, we show the validity of the proposed approaches.
 Forth, the humanoid. robots move throughout the world, and the surrounding
environment, source positions, and source mixtures are constantly changing. In ad-
dition, the speech overlaps are frequently occurred during conversation. Thus, the
source separation techniques are useful for improving the speaker identification per-
formance. To deal with those problems, in Chapter 6, we consider the problem of
two-source signal separation from a two-microphone array, where a point source such
as a speech signal is placed in front of a two-microphone array, while no information
is available about another interference signal. We propose a simple and computation-
ally efficient method. for estimating the geometry and source type (a point or diffuse)
of the interference signal, which allows us to adaptively choose a suitable unmixing
matrix initialization scheme. Our propose method, noise adaptive optimization of
matrix initialization
(NAOMI), is shown to be effective through source separation
and speaker identification simulations.","subitem_description_type":"Other"}]},"item_1_description_18":{"attribute_name":"フォーマット","attribute_value_mlt":[{"subitem_description":"application/pdf","subitem_description_type":"Other"}]},"item_1_description_7":{"attribute_name":"学位記番号","attribute_value_mlt":[{"subitem_description":"総研大甲第1333号","subitem_description_type":"Other"}]},"item_1_select_14":{"attribute_name":"所蔵","attribute_value_mlt":[{"subitem_select_item":"有"}]},"item_1_select_8":{"attribute_name":"研究科","attribute_value_mlt":[{"subitem_select_item":"複合科学研究科"}]},"item_1_select_9":{"attribute_name":"専攻","attribute_value_mlt":[{"subitem_select_item":"15 統計科学専攻"}]},"item_1_text_10":{"attribute_name":"学位授与年度","attribute_value_mlt":[{"subitem_text_value":"2009"}]},"item_creator":{"attribute_name":"著者","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"YAMADA, Makoto","creatorNameLang":"en"}],"nameIdentifiers":[{}]}]},"item_files":{"attribute_name":"ファイル情報","attribute_type":"file","attribute_value_mlt":[{"accessrole":"open_date","date":[{"dateType":"Available","dateValue":"2016-02-17"}],"displaytype":"simple","filename":"甲1333_要旨.pdf","filesize":[{"value":"355.6 kB"}],"format":"application/pdf","licensetype":"license_11","mimetype":"application/pdf","url":{"label":"要旨・審査要旨","url":"https://ir.soken.ac.jp/record/1675/files/甲1333_要旨.pdf"},"version_id":"292aa5dc-ef6d-44fe-b78d-8b5eb8134fcd"},{"accessrole":"open_date","date":[{"dateType":"Available","dateValue":"2016-02-17"}],"displaytype":"simple","filename":"甲1333_本文.pdf","filesize":[{"value":"1.5 MB"}],"format":"application/pdf","licensetype":"license_11","mimetype":"application/pdf","url":{"label":"本文","url":"https://ir.soken.ac.jp/record/1675/files/甲1333_本文.pdf"},"version_id":"d215b245-f9b1-41c8-94de-89c1fb0c7c86"}]},"item_language":{"attribute_name":"言語","attribute_value_mlt":[{"subitem_language":"eng"}]},"item_resource_type":{"attribute_name":"資源タイプ","attribute_value_mlt":[{"resourcetype":"thesis","resourceuri":"http://purl.org/coar/resource_type/c_46ec"}]},"item_title":"Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification","item_titles":{"attribute_name":"タイトル","attribute_value_mlt":[{"subitem_title":"Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification"},{"subitem_title":"Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification","subitem_title_language":"en"}]},"item_type_id":"1","owner":"21","path":["17"],"pubdate":{"attribute_name":"公開日","attribute_value":"2011-01-18"},"publish_date":"2011-01-18","publish_status":"0","recid":"1675","relation_version_is_last":true,"title":["Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification"],"weko_creator_id":"21","weko_shared_id":-1},"updated":"2023-06-20T15:57:19.908071+00:00"}