2024-07-27T00:22:49Z https://ir.soken.ac.jp/oai

oai:ir.soken.ac.jp:00001503 2023-06-20T15:59:03Z 2:429:19

AUTOMATIC EXTRACTION OF LOGICALLY CONSISTENT ONTOLOGIES FROM TEXT CORPORA AUTOMATIC EXTRACTION OF LOGICALLY CONSISTENT ONTOLOGIES FROM TEXT CORPORA McCRAE, John Philip マックレー, ジョンフィリップ McCRAE, John Philip 総合研究大学院大学博士（情報学） Ontologies provide a structured description of the concepts and terminology used in a particular domain and provide valuable knowledge for a range of natu- ral language processing applications. However, for many domains and languages ontologies do not exist and manual creation is a difficult and resource-intensive process. As such, automatic methods to extract, expand or aid the construction of these resources is of significant interest. 　　There are a number of methods for extracting semantic information about how terms are related from raw text, most notably the approach of Hearst [1992], who used patterns to extract hypernym information. This method was manual and it is not clear how to automatically generate patterns, which are specific to a given relationship and domain. I present a novel method for de- veloping patterns based on the use of alignments between patterns. Alignment works well as it is closely related to the concept of a join-set of patterns, which minimally generalise over-fitting patterns. I show that join-sets can be viewed as an reduction on the search space of patterns, while resulting in no loss of accuracy. I then show the results can be combined by a support vector machine to a obtain a classifier, which can decide if a pair of terms are related. I applied this to several data sets and conclude that this method produces a precise result, with reasonable recall. 　　The system I developed, like many semantic relation systems, produces only a binary decision of whether a term pair is related. Ontologies have a structure, that limits the forms of networks they represent. As the relation extraction is generally noisy and incomplete, it is unlikely that the extracted relations will match the structure of the ontology. As such I represent the structure of ontol- ogy as a set of logical statements, and form a consistent ontology by finding the network closest to the relation extraction system's output, which is consistent with these restrictions. This gives a novel NP-hard optimisation problem, for which I develop several algorithms. I present simple greedy approaches, and branch and bound approaches, which my results show are not sufficient for this problem. I then use resolution to show how this problem can be stated as an integer programming problem, which can be efficiently solved by relaxing it to a linear programming problem. I show that this result can efficiently solve the problem, and furthermore when applied to the result of the relation extraction system, this improves the quality of the extraction as well as converting it to an ontological structure. application/pdf 総研大甲第1288号 thesis 2009-09-30 application/pdf application/pdf https://ir.soken.ac.jp/record/1503/files/甲1288_要旨.pdf https://ir.soken.ac.jp/record/1503/files/甲1288_本文.pdf eng