2024-06-18T08:50:15Z
https://ir.soken.ac.jp/oai
oai:ir.soken.ac.jp:00001503
2023-06-20T15:59:03Z
2:429:19
AUTOMATIC EXTRACTION OF LOGICALLY CONSISTENT ONTOLOGIES FROM TEXT CORPORA
AUTOMATIC EXTRACTION OF LOGICALLY CONSISTENT ONTOLOGIES FROM TEXT CORPORA
McCRAE, John Philip
マックレー, ジョン フィリップ
McCRAE, John Philip
総合研究大学院大学
博士（情報学）
Ontologies provide a structured description of the concepts and terminology<br />used in a particular domain and provide valuable knowledge for a range of natu-<br />ral language processing applications. However, for many domains and languages<br />ontologies do not exist and manual creation is a difficult and resource-intensive<br />process. As such, automatic methods to extract, expand or aid the construction<br />of these resources is of significant interest.<br /> There are a number of methods for extracting semantic information about<br />how terms are related from raw text, most notably the approach of Hearst<br />[1992], who used <i>patterns</i> to extract hypernym information. This method was<br />manual and it is not clear how to automatically generate patterns, which are<br />specific to a given relationship and domain. I present a novel method for de-<br />veloping patterns based on the use of alignments between patterns. Alignment<br />works well as it is closely related to the concept of a <i>join-set</i> of patterns, which<br />minimally generalise over-fitting patterns. I show that join-sets can be viewed<br />as an reduction on the search space of patterns, while resulting in no loss of<br />accuracy. I then show the results can be combined by a <i>support vector machine</i><br />to a obtain a classifier, which can decide if a pair of terms are related. I applied<br />this to several data sets and conclude that this method produces a precise result,<br />with reasonable recall.<br /> The system I developed, like many semantic relation systems, produces only<br />a binary decision of whether a term pair is related. Ontologies have a structure,<br />that limits the forms of networks they represent. As the relation extraction is<br />generally noisy and incomplete, it is unlikely that the extracted relations will<br />match the structure of the ontology. As such I represent the structure of ontol-<br />ogy as a set of logical statements, and form a consistent ontology by finding the<br />network closest to the relation extraction system's output, which is consistent<br />with these restrictions. This gives a novel <i>NP-hard</i> optimisation problem, for<br />which I develop several algorithms. I present simple greedy approaches, and<br />branch and bound approaches, which my results show are not sufficient for this<br />problem. I then use resolution to show how this problem can be stated as an<br /><i>integer programming problem,</i> which can be efficiently solved by relaxing it to<br />a <i>linear programming problem</i>. I show that this result can efficiently solve the<br />problem, and furthermore when applied to the result of the relation extraction<br />system, this improves the quality of the extraction as well as converting it to an<br />ontological structure.
application/pdf
総研大甲第1288号
thesis
2009-09-30
application/pdf
application/pdf
https://ir.soken.ac.jp/record/1503/files/甲1288_要旨.pdf
https://ir.soken.ac.jp/record/1503/files/甲1288_本文.pdf
eng