ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 020 学位論文
  2. 複合科学研究科
  3. 17 情報学専攻

Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases

https://ir.soken.ac.jp/records/4082
https://ir.soken.ac.jp/records/4082
b2b499f2-06ca-44ef-a207-d1205e09e206
名前 / ファイル ライセンス アクション
甲1602_要旨.pdf 要旨・審査要旨 (341.8 kB)
Item type 学位論文 / Thesis or Dissertation(1)
公開日 2013-11-20
タイトル
タイトル Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases
タイトル
タイトル Efficient Retrieval of Highly Ranked Documents for Informational Search on Large Scale Text Databases
言語 en
言語
言語 eng
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_46ec
資源タイプ thesis
著者名 藤田, 悦郎

× 藤田, 悦郎

藤田, 悦郎

Search repository
フリガナ フジタ, エツロウ

× フジタ, エツロウ

フジタ, エツロウ

Search repository
著者 FUJITA, Etsuro

× FUJITA, Etsuro

en FUJITA, Etsuro

Search repository
学位授与機関
学位授与機関名 総合研究大学院大学
学位名
学位名 博士(情報学)
学位記番号
内容記述タイプ Other
内容記述 総研大甲第1602号
研究科
値 複合科学研究科
専攻
値 17 情報学専攻
学位授与年月日
学位授与年月日 2013-03-22
学位授与年度
値 2012
要旨
内容記述タイプ Other
内容記述 With the successful adoption of link analysis techniques such as PageRank
and web spam filtering, current web search engines support navigational
search well, where a user is looking for a particular web resource that
the user has in mind. However, such engines do not necessarily support
informational search well, where a user is looking for information about
a certain topic that might be on diverse web resources. This is because
a user often forms an informational query by a few keywords that does
not necessarily model the user information need well while such engines
search web documents basically based on conjunctive Boolean searching
using the submitted keywords. Informational search would be better han-
dled by a web search engine based on an information retrieval (IR) model
combined with automatic query expansion. Moreover, the realization of
such an engine requires a method to process the IR model efficiently. So
in this thesis, we propose new top-k document retrieval algorithms that
efficiently process long queries generated by automatic query expansion,
by introducing a simple additional data structure called “query-term-by-
document binary matrix,” which indicates which document contains which
query term. We show on the basis of theoretical analysis that our algo-
rithms not only find the top-k documents exactly but also have a desirable
property on processing cost as described below. Furthermore, we show
on the basis of empirical evaluation using the TREC GOV2 collection that
our algorithms achieve considerable performance gains over existing al-
gorithms especially when the number of query terms gets larger, yielding
speedup of up to a factor of about 2 over existing algorithms for top-100
document retrieval for 64-term queries. Then, we extend our algorithms
for supporting proximity search to take advantage of the structured nature
of web documents, and show that the extended versions of our algorithms
are still exact for finding the top-k documents and desirable on process-
ing cost. The proposed algorithms presented in this thesis are applicable
not only to web search but also to other areas such as enterprise search.
The novel contribution of this thesis is summarized as follows: (a) The
proposal of new top-k document retrieval algorithms that efficiently pro-
cess long queries generated by automatic query expansion, by introducing
a simple additional data structure called query-term-by-document binary
matrix. (b) The theoretical analysis on the proposed algorithms. We show
that our algorithms not only find the top-k documents exactly but also have
the desirable property on processing cost. (c) The empirical evaluation
of the proposed algorithms. We demonstrate that our algorithms achieve
considerable performance gains over existing algorithms. And (d) The
extension of the above algorithms for supporting proximity search. The
algorithms proposed in this thesis efficiently process an IR model com-
bined with automatic query expansion and/or proximity search, by intro-
ducing a simple additional data structure called query-term-by-document
binary matrix. Due to the simplicity of our method using query-term-by-
document binary matrix, our method is also applicable to other IR tech-
niques. We believe that our method paves the way for practical use of vari-
ous IR techniques, including an IR model combined with automatic query
expansion and/or proximity search, in large-scale text databases such as
the web, which has been considered difficult because of their inefficiency.
所蔵
値 有
戻る
0
views
See details
Views

Versions

Ver.1 2023-06-20 15:15:47.389472
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3