WEKO3
アイテム
Rapid Behavior Adaptation for Human Centered Robots Through Demonstration
https://ir.soken.ac.jp/records/2486
https://ir.soken.ac.jp/records/2486287c4737-9d9a-4865-8a03-bc61ab1e7a87
名前 / ファイル | ライセンス | アクション |
---|---|---|
要旨・審査要旨 (395.9 kB)
|
||
本文 (2.2 MB)
|
Item type | 学位論文 / Thesis or Dissertation(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2012-01-04 | |||||
タイトル | ||||||
タイトル | Rapid Behavior Adaptation for Human Centered Robots Through Demonstration | |||||
タイトル | ||||||
タイトル | Rapid Behavior Adaptation for Human Centered Robots Through Demonstration | |||||
言語 | en | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_46ec | |||||
資源タイプ | thesis | |||||
著者名 |
Tareeq , Saifuddin
× Tareeq , Saifuddin |
|||||
フリガナ |
タリーク , サイフディン
× タリーク , サイフディン |
|||||
著者 |
TAREEQ, Saifuddin
× TAREEQ, Saifuddin |
|||||
学位授与機関 | ||||||
学位授与機関名 | 総合研究大学院大学 | |||||
学位名 | ||||||
学位名 | 博士(情報学) | |||||
学位記番号 | ||||||
内容記述タイプ | Other | |||||
内容記述 | 総研大甲第1427号 | |||||
研究科 | ||||||
値 | 複合科学研究科 | |||||
専攻 | ||||||
値 | 17 情報学専攻 | |||||
学位授与年月日 | ||||||
学位授与年月日 | 2011-03-24 | |||||
学位授与年度 | ||||||
値 | 2010 | |||||
要旨 | ||||||
内容記述タイプ | Other | |||||
内容記述 | Robots have proven powerful tools in the predictable environments of factories and manufacturing plants. However, they have been far less successful in human environments characterized by a higher degree of uncertainty and change. Each response of today's industrial robots has to be programmed in advance. This approach is ill suited for robots in human environments, which require a vast amount of knowledge and the specification of a wide set of behaviors for successful performance. Typically robots in human environments are placed in very restricted worlds because then the environment can be controlled. If a robot is taken in a unknown home, that approach just doesn't hold anymore. Moreover when the user or environment changes frequently the robotic system should be able to adapt to new user or environment rapidly to take correct action. This has introduced the need for building robotic systems able to adapting to user and environment in an engaging way by using their observed sensory information. The recent trend in robotics is to develop a new generation of robots that are capable of adapting to new user, interacting with user and participate in our daily lives. Adaptive behavior plays an important role in the assistance of different user with different needs. Therefore, such robots should be able to rapidly adapt to user preference, user policy and have interaction skills to communicate with user. In this thesis user’s preference indicates variation of behavior decision by the user even though identical sensor is observed. And user's policy is defined by the mapping from observation to action. The problem of learning a policy, a task representation mapping from world states to actions, lies at the heart of many robotic applications. One approach to acquiring a task policy is learning from demonstration, an interactive learning approach based on humanrobot interaction that provides an intuitive interface for robot programming. In this approach, a teacher performs demonstrations of the desired behavior to the robot. The robot records the demonstrations, typically as state to action mappings, and learns a policy imitating the teacher’s behavior. Learning from demonstration is an incremental online learning process in which the robot begins with no knowledge about the task, and acquires training data until a fully autonomous policy representing the complete task is learned. If the user changes his preference or policy the system should adapt to the new preference or policy rapidly. This thesis contributes an interactive approach to demonstration learning that enables the robot to rapidly adapt to user preference or policy. These algorithms enable the robot to identify the need for and request demonstrations for specific parts of the state space based on confidence thresholds characterizing the uncertainty of the learned policy. In our evaluation, we show that this approach significantly reduces the number of demonstrations and can rapidly follow user preference or policy. Demonstrations provide the robot with a dataset consisting of state-action pairs representing examples of the desired behavior. The robot’s goal is to use this information to adapt to a policy, which enables the robot to select an action based upon its current world state. Our policy should map from the robot’s state to a discrete set of action primitives. And due to the interactive nature of learning from demonstration, policy adaptation must occur in real time. The stateaction mapping represented by a policy is typically complex. One reason for this complexity is that the desired observationaction mapping is unknown. A second reason for this complexity is the complications of policy adaptation in real world environments. Traditional approaches to robot control model the domain dynamics and derive policies using mathematical models. Though theoretically wellfounded, these approaches depend heavily upon the accuracy of the model. Not only does this model require considerable expertise to develop, but approximations such as linearization are often introduced for computational tractability, thereby degrading performance. Other approaches, such as (reinforcement learning), guide policy learning by providing reward feedback about the desirability of visiting particular states. To define a function to provide these rewards, however, is known to be a difficult problem that also requires considerable expertise to address. Furthermore, building the policy requires gathering information by visiting states to receive rewards, which is nontrivial for a mobile robot learner executing actual actions in the real world. We chose Bayesian network for rapid policy adaptation because it can represent degree of confidence for behavior decision as probability and can provide a confidence even with a small number of observations. Also Bayesian network is suitable for online interactive learning. This thesis presents a Bayesian network based framework to address rapid behavior adaptation. The performance of Bayesian learning strongly depends on the quality of the demonstration dataset. When the dataset included significant data, the learning would be a success. But it is difficult to evaluate data to be insignificant because when the data become insignificant for learning process is not known a priori. We propose a method for evaluating significance of data based on a concept of change in the degree of confidence. A small change in the degree of confidence can be regarded as an insignificant data for learning, so that data will be evaluated as insignificant. For evaluating the significance of demonstration, the experience data is assigned to distribution parameters. The distribution represents not only event probability among behaviors, but also degree of confidence for the output probability. The system calculates the degree of confidence by integrating the area around peak of the distribution after each demonstration. The change in the two consecutive degrees of confidence can be regarded as the importance of the observation to the learning process. When the change in the degree of confidence in two consecutive time steps is small, this situation is regarded as familiar? the experience data is considered insignificant for learning and discarded. In contrast, when the robot detect a large change in the degree of confidence in two consecutive time steps, this situation is considered unfamiliar? the experience data is considered significant for learning and be accepted. With this significance evaluation method we introduce multiple rapid behavior adaptation algorithms that enable the robot to evaluate demonstrations based on the change in the degree of confidence. The rapid adaptation algorithm enables the robot to evaluate demonstrations in real time as it interacts with the user. |
|||||
所蔵 | ||||||
値 | 有 | |||||
フォーマット | ||||||
内容記述タイプ | Other | |||||
内容記述 | application/pdf |