2024-10-11T14:49:49Z
https://ir.soken.ac.jp/oai
oai:ir.soken.ac.jp:00001503
2023-06-20T15:59:03Z
2:429:19
AUTOMATIC EXTRACTION OF LOGICALLY CONSISTENT ONTOLOGIES FROM TEXT CORPORA
AUTOMATIC EXTRACTION OF LOGICALLY CONSISTENT ONTOLOGIES FROM TEXT CORPORA
McCRAE, John Philip
0
マックレー, ジョン フィリップ
0
McCRAE, John Philip
0
総合研究大学院大学
博士（情報学）
Ontologies provide a structured description of the concepts and terminology<br />used in a particular domain and provide valuable knowledge for a range of natu-<br />ral language processing applications. However, for many domains and languages<br />ontologies do not exist and manual creation is a difficult and resource-intensive<br />process. As such, automatic methods to extract, expand or aid the construction<br />of these resources is of significant interest.<br /> There are a number of methods for extracting semantic information about<br />how terms are related from raw text, most notably the approach of Hearst<br />[1992], who used <i>patterns</i> to extract hypernym information. This method was<br />manual and it is not clear how to automatically generate patterns, which are<br />specific to a given relationship and domain. I present a novel method for de-<br />veloping patterns based on the use of alignments between patterns. Alignment<br />works well as it is closely related to the concept of a <i>join-set</i> of patterns, which<br />minimally generalise over-fitting patterns. I show that join-sets can be viewed<br />as an reduction on the search space of patterns, while resulting in no loss of<br />accuracy. I then show the results can be combined by a <i>support vector machine</i><br />to a obtain a classifier, which can decide if a pair of terms are related. I applied<br />this to several data sets and conclude that this method produces a precise result,<br />with reasonable recall.<br /> The system I developed, like many semantic relation systems, produces only<br />a binary decision of whether a term pair is related. Ontologies have a structure,<br />that limits the forms of networks they represent. As the relation extraction is<br />generally noisy and incomplete, it is unlikely that the extracted relations will<br />match the structure of the ontology. As such I represent the structure of ontol-<br />ogy as a set of logical statements, and form a consistent ontology by finding the<br />network closest to the relation extraction system's output, which is consistent<br />with these restrictions. This gives a novel <i>NP-hard</i> optimisation problem, for<br />which I develop several algorithms. I present simple greedy approaches, and<br />branch and bound approaches, which my results show are not sufficient for this<br />problem. I then use resolution to show how this problem can be stated as an<br /><i>integer programming problem,</i> which can be efficiently solved by relaxing it to<br />a <i>linear programming problem</i>. I show that this result can efficiently solve the<br />problem, and furthermore when applied to the result of the relation extraction<br />system, this improves the quality of the extraction as well as converting it to an<br />ontological structure.
application/pdf
総研大甲第1288号
thesis
2009-09-30
application/pdf
application/pdf
https://ir.soken.ac.jp/record/1503/files/甲1288_要旨.pdf
https://ir.soken.ac.jp/record/1503/files/甲1288_本文.pdf
https://ir.soken.ac.jp/records/1503
eng