{"created":"2023-06-20T13:21:27.609803+00:00","id":1675,"links":{},"metadata":{"_buckets":{"deposit":"34d0e35d-b45c-41b5-83b5-cc124a651db9"},"_deposit":{"created_by":21,"id":"1675","owners":[21],"pid":{"revision_id":0,"type":"depid","value":"1675"},"status":"published"},"_oai":{"id":"oai:ir.soken.ac.jp:00001675","sets":["2:429:17"]},"author_link":["0","0","0"],"item_1_creator_2":{"attribute_name":"著者名","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"山田, 誠"}],"nameIdentifiers":[{"nameIdentifier":"0","nameIdentifierScheme":"WEKO"}]}]},"item_1_creator_3":{"attribute_name":"フリガナ","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"ヤマダ, マコト"}],"nameIdentifiers":[{"nameIdentifier":"0","nameIdentifierScheme":"WEKO"}]}]},"item_1_date_granted_11":{"attribute_name":"学位授与年月日","attribute_value_mlt":[{"subitem_dategranted":"2010-03-24"}]},"item_1_degree_grantor_5":{"attribute_name":"学位授与機関","attribute_value_mlt":[{"subitem_degreegrantor":[{"subitem_degreegrantor_name":"総合研究大学院大学"}]}]},"item_1_degree_name_6":{"attribute_name":"学位名","attribute_value_mlt":[{"subitem_degreename":"博士（統計科学）"}]},"item_1_description_12":{"attribute_name":"要旨","attribute_value_mlt":[{"subitem_description":"The speaker identification is one of the key technologies for person identification in<br />humanoid robots. Especially, when the face information is not available, the speaker<br />identification is the only way to identify person, thus, to improve the speaker identi-<br />fication performance is an important issue for person identification tasks.<br />　There are four major issues in speaker identification for humanoid robots in prac-<br />tice. First, the humanoid robots should identify the speaker in real-time with high<br />identification rates. In these days, the kernel methods such as the support vector<br />machine (SVM) and kernel logistic regression (KLR) are popular for speaker identifi-<br />cation tasks, and the kernel based systems outperform the conventional Gaussian<br />Mixture Model (GMM) based system. However, the kernel based speaker iden-<br />tification systems are usually computationally intensive, and this is of course not<br />preferable for real-time implementation. To deal with the computational issue, we<br />propose a method of approximating the sequence kernel that is shown to be compu-<br />tationally very efficient in Chapter 3. More specifically, we formulate the problem<br />of approximating the sequence kernel as the problem of obtaining a <i>pre-image</i> in<br />a reproducing kernel Hilbert space. The effectiveness of the proposed approximation<br />is demonstrated in text-independent speaker identification experiments with 10 male<br />speakers?our approach provides significant reduction in computation time while per-<br />formance degradation is kept moderately. Based on the proposed method, we develop<br />a real-time kernel-based speaker identification system using the Virtual Studio Tech-<br />nology (VST).<br />　Second, the speech features vary over time due to session dependent variation,<br />the recording environment change, and physical conditions/emotions. However, con-<br />ventional kernel based systems implicitly ignore these facts, and they just simply<br />assume that the training and test input probability distributions of the training and<br />test datasets are same at any time. To alleviate the influence of session dependent<br />variation, it is popular to use several sessions of speaker utterance samples or to use<br /><i>cepstral mean normalization</i> (CMN). However, gathering several sessions of speaker<br />utterance data and assigning the speaker ID to the collected data are expensive both<br />in time and cost and therefore not realistic in practice. Moreover, it is not possi-<br />ble to perfectly remove the session dependent variation by CMN alone. Thus, in<br />Chapter 4, we propose a novel semi-supervised speaker identification method that<br />can alleviate the influence of non-stationarity such as session dependent variation,<br />the recording environment change, and physical conditions / emotions. We assume<br />that the voice quality variants follow the <i>covariate shift</i> model, where only the voice<br />feature distribution changes in the training and test phases. Our method consists of<br />weighted versions of kernel logistic regraession and cross validation and is theoretically<br />shown to have the capability of alleviating the influence of covariate shift, where the<br />weight (a.k.a importance) is estimated from the training and test distribution using<br />the Kullback-Leibler Importance Estimation Procedure (KLIEP). We experimentally<br />show through text-independent / dependent speaker identification simulations that the<br />proposed method is promising in dealing with variations in voice quality.<br />　Third, the humanoid robots are desired to automatically detect the unknown<br />speaker and add the unknown speaker into the dictionary. Thus, the speaker detec-<br />tion task can be formulated as the outlier detection problem (i.e., outliers can be the<br />unknown speakers). Since the outlier detection problem can be solved through the<br />comparison between the log likelihoods of the unknown speaker and the speakers,<br />the estimation accuracy of the log likelihoods is an important issue to improve the<br />speaker detection performance. Thus, in chapter 5, we propose a new importance<br />(a.k.a likelihood) estimation method using Gaussian mixture models (GMMs) and<br />principal component analyzers (PPCAs) mixture, where the proposed approach esti-<br />mates the importance without going through the density estimation. An advantage of<br />the proposed methods is that covariance matrices or projection matrices can also be<br />learned through an expectation-maximization procedure, so the proposed method ex-<br />pected to work well when the true importance function has high correlation. Through<br />experiments of outlier detection, we show the validity of the proposed approaches.<br />　Forth, the humanoid. robots move throughout the world, and the surrounding<br />environment, source positions, and source mixtures are constantly changing. In ad-<br />dition, the speech overlaps are frequently occurred during conversation. Thus, the<br />source separation techniques are useful for improving the speaker identification per-<br />formance. To deal with those problems, in Chapter 6, we consider the problem of<br />two-source signal separation from a two-microphone array, where a point source such<br />as a speech signal is placed in front of a two-microphone array, while no information<br />is available about another <i>interference</i> signal. We propose a simple and computation-<br />ally efficient method. for estimating the geometry and source type (a point or diffuse)<br />of the interference signal, which allows us to adaptively choose a suitable unmixing<br />matrix initialization scheme. Our propose method, <i>noise adaptive optimization of<br />matrix initialization</i>(NAOMI), is shown to be effective through source separation<br />and speaker identification simulations.","subitem_description_type":"Other"}]},"item_1_description_18":{"attribute_name":"フォーマット","attribute_value_mlt":[{"subitem_description":"application/pdf","subitem_description_type":"Other"}]},"item_1_description_7":{"attribute_name":"学位記番号","attribute_value_mlt":[{"subitem_description":"総研大甲第1333号","subitem_description_type":"Other"}]},"item_1_select_14":{"attribute_name":"所蔵","attribute_value_mlt":[{"subitem_select_item":"有"}]},"item_1_select_8":{"attribute_name":"研究科","attribute_value_mlt":[{"subitem_select_item":"複合科学研究科"}]},"item_1_select_9":{"attribute_name":"専攻","attribute_value_mlt":[{"subitem_select_item":"15 統計科学専攻"}]},"item_1_text_10":{"attribute_name":"学位授与年度","attribute_value_mlt":[{"subitem_text_value":"2009"}]},"item_creator":{"attribute_name":"著者","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"YAMADA, Makoto","creatorNameLang":"en"}],"nameIdentifiers":[{"nameIdentifier":"0","nameIdentifierScheme":"WEKO"}]}]},"item_files":{"attribute_name":"ファイル情報","attribute_type":"file","attribute_value_mlt":[{"accessrole":"open_date","date":[{"dateType":"Available","dateValue":"2016-02-17"}],"displaytype":"simple","filename":"甲1333_要旨.pdf","filesize":[{"value":"355.6 kB"}],"format":"application/pdf","licensetype":"license_11","mimetype":"application/pdf","url":{"label":"要旨・審査要旨","url":"https://ir.soken.ac.jp/record/1675/files/甲1333_要旨.pdf"},"version_id":"292aa5dc-ef6d-44fe-b78d-8b5eb8134fcd"},{"accessrole":"open_date","date":[{"dateType":"Available","dateValue":"2016-02-17"}],"displaytype":"simple","filename":"甲1333_本文.pdf","filesize":[{"value":"1.5 MB"}],"format":"application/pdf","licensetype":"license_11","mimetype":"application/pdf","url":{"label":"本文","url":"https://ir.soken.ac.jp/record/1675/files/甲1333_本文.pdf"},"version_id":"d215b245-f9b1-41c8-94de-89c1fb0c7c86"}]},"item_language":{"attribute_name":"言語","attribute_value_mlt":[{"subitem_language":"eng"}]},"item_resource_type":{"attribute_name":"資源タイプ","attribute_value_mlt":[{"resourcetype":"thesis","resourceuri":"http://purl.org/coar/resource_type/c_46ec"}]},"item_title":"Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification","item_titles":{"attribute_name":"タイトル","attribute_value_mlt":[{"subitem_title":"Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification"},{"subitem_title":"Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification","subitem_title_language":"en"}]},"item_type_id":"1","owner":"21","path":["17"],"pubdate":{"attribute_name":"公開日","attribute_value":"2011-01-18"},"publish_date":"2011-01-18","publish_status":"0","recid":"1675","relation_version_is_last":true,"title":["Kernel Methods and Frequency Domain Independent Component Analysis for Robust Speaker Identification"],"weko_creator_id":"21","weko_shared_id":-1},"updated":"2023-06-20T15:57:19.908071+00:00"}