Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases

藤田, 悦郎; フジタ, エツロウ; FUJITA, Etsuro

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases

https://ir.soken.ac.jp/records/4082

名前 / ファイル	ライセンス	アクション
要旨・審査要旨 (341.8 kB)

Item type

学位論文 / Thesis or Dissertation(1)

公開日

2013-11-20

タイトル

Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases

タイトル

Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_46ec

資源タイプ

thesis

著者名

藤田, 悦郎

フリガナ

フジタ, エツロウ

著者

FUJITA, Etsuro

学位授与機関

学位授与機関名

総合研究大学院大学

学位名

博士（情報学）

学位記番号

内容記述タイプ

Other

内容記述

総研大甲第1602号

研究科

値

複合科学研究科

専攻

値

17 情報学専攻

学位授与年月日

2013-03-22

学位授与年度

値

2012

要旨

内容記述タイプ

Other

内容記述

With the successful adoption of link analysis techniques such as PageRank
and web spam filtering, current web search engines support navigational
search well, where a user is looking for a particular web resource that
the user has in mind. However, such engines do not necessarily support
informational search well, where a user is looking for information about
a certain topic that might be on diverse web resources. This is because
a user often forms an informational query by a few keywords that does
not necessarily model the user information need well while such engines
search web documents basically based on conjunctive Boolean searching
using the submitted keywords. Informational search would be better han-
dled by a web search engine based on an information retrieval (IR) model
combined with automatic query expansion. Moreover, the realization of
such an engine requires a method to process the IR model efficiently. So
in this thesis, we propose new top-k document retrieval algorithms that
efficiently process long queries generated by automatic query expansion,
by introducing a simple additional data structure called “query-term-by-
document binary matrix,” which indicates which document contains which
query term. We show on the basis of theoretical analysis that our algo-
rithms not only find the top-k documents exactly but also have a desirable
property on processing cost as described below. Furthermore, we show
on the basis of empirical evaluation using the TREC GOV2 collection that
our algorithms achieve considerable performance gains over existing al-
gorithms especially when the number of query terms gets larger, yielding
speedup of up to a factor of about 2 over existing algorithms for top-100
document retrieval for 64-term queries. Then, we extend our algorithms
for supporting proximity search to take advantage of the structured nature
of web documents, and show that the extended versions of our algorithms
are still exact for finding the top-k documents and desirable on process-
ing cost. The proposed algorithms presented in this thesis are applicable
not only to web search but also to other areas such as enterprise search.
The novel contribution of this thesis is summarized as follows: (a) The
proposal of new top-k document retrieval algorithms that efficiently pro-
cess long queries generated by automatic query expansion, by introducing
a simple additional data structure called query-term-by-document binary
matrix. (b) The theoretical analysis on the proposed algorithms. We show
that our algorithms not only find the top-k documents exactly but also have
the desirable property on processing cost. (c) The empirical evaluation
of the proposed algorithms. We demonstrate that our algorithms achieve
considerable performance gains over existing algorithms. And (d) The
extension of the above algorithms for supporting proximity search. The
algorithms proposed in this thesis efficiently process an IR model com-
bined with automatic query expansion and/or proximity search, by intro-
ducing a simple additional data structure called query-term-by-document
binary matrix. Due to the simplicity of our method using query-term-by-
document binary matrix, our method is also applicable to other IR tech-
niques. We believe that our method paves the way for practical use of vari-
ous IR techniques, including an IR model combined with automatic query
expansion and/or proximity search, in large-scale text databases such as
the web, which has been considered difficult because of their inefficiency.

所蔵

値

有

戻る

views

See details

	Views

Versions

Ver.1

2023-06-20 15:15:47.389472

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases

× 藤田, 悦郎

× フジタ, エツロウ

× FUJITA, Etsuro

Versions

Share

Cite as

エクスポート