ゲノム塩基配列に潜む生物種の個性の情報学的探索と生物進化多様性の研究

IKEMURA, Toshimich; 池村, 淑道

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

ゲノム塩基配列に潜む生物種の個性の情報学的探索と生物進化多様性の研究

https://ir.soken.ac.jp/records/3245

Item type

科研費報告書 / Reports of Grants-in-Aid for Scientidic Research(1)

公開日

2011-11-20

タイトル

ゲノム塩基配列に潜む生物種の個性の情報学的探索と生物進化多様性の研究

タイトル

Bioinformatics strategy for unveiling hidden genome signatures and biodiversity

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18ws

資源タイプ

research report

アクセス権

metadata only access

アクセス権URI

http://purl.org/coar/access_right/c_14cb

著者

池村, 淑道

著者名(英)

IKEMURA, Toshimich

科研費研究者番号

内容記述タイプ

Other

内容記述

50025475 | http://rns.nii.ac.jp/d/nr/1000050025475

研究分野

進化生物学

研究種目

基盤研究(C)

研究期間

内容記述タイプ

Other

内容記述

2004年度～2005年度

抄録

内容記述タイプ

Abstract

内容記述

教師なしニューラルネットワークアルゴリズムの自己組織化マップ(SOM)は大量情報の全体像と部分情報の両方を効率的に把握し、2次元上に可視化できる。ゲノム配列の解読の進んだ真核生物種、ならびにゲノム配列の完全に解読された原核生物の総計約300種のゲノム由来の10と100kbの断片化配列全体に関して、3〜5連塩基頻度のSOM解析を行い、各ゲノムを特徴付ける連文字配列の出現パターンを明らかにした。4連や5連塩基頻度のSOM解析には長時間を必要とするが、地球シュミレータを使用できるようになったことで、大規模SOMが可能になった。1kb程度のヒトやマウスの断片配列をSOM解析すると、単一のゲノム内においても、遺伝子上流の転写制御領域、5'と3'UTR、CDS、イントロン領域の相互間で明瞭に分離する傾向を示した。また、これらの各グループ内でも複数のクラスターに分離する傾向を示した。機能と関係するシグナル配列類の候補を探索する新規な情報学的な手段を提供できた。

各生物のゲノムを特徴付ける連文字配列は、変異やその修復機構の性質を反映するだけでなく、機能的に重要なシグナルに対応する連文字配列が特徴的な出現頻度と分布を持つことにも関連していた。例えば、転写因子に対して高い配列依存性を示し、安定に結合するシグナル配列類は、ランダム配列からの予想値よりも低頻度に出現する傾向にあった。塩基配列が解読されるが、他の分子生物学の実験データに乏しいゲノムが急増する傾向にある。実験的な研究をin silicoで代行することが必須になる。塩基配列のみを用いて重要なシグナルが推定可能なSOMは、この目的に合致している。分子生物学の研究の進んだゲノムについて、SOM解析の知見を十分に集積しておけば、それらを基礎知識とすることで、新規なゲノムについてもシグナルの候補配列のin silico探索が可能となる。環境中の難培養性微生物類由来のゲノム断片の系統推定と生物多様性の解析も可能にした。

Novel tools are needed for comprehensive comparisons of interspecies characteristics of massive amounts of genomic sequences currently available. An unsupervised neural network algorithm, Self-Organizing Map (SOM), is an effective tool for clustering and visualizing high-dimensional complex data on a single map. We modified the conventional SOM, on the basis of batch-learning SOM, for genome informatics making the learning process and resulting map independent of the order of data input. We generated the SOMs for tri-and tetranucleotide frequencies in 10-and 100-kb sequence fragments from 38 eukaryotes for which almost complete genome sequences are available. SOM recognized species-specific characteristics (key combinations of oligonucleotide frequencies) in the genomic sequences, permitting species-specific classification of the sequences without any information regarding the species. We also generated the SOM for tetranucleotide frequencies in 1-kb sequence fragments from the human genome and found sequences for four functional categories (5' and 3' UTRs, CDSs and introns) were classified primarily according to the categories. Because the classification and visualization power is very high, SOM is an efficient and powerful tool for extracting a wide range of genome information.

SOM that was constructed with oligonucleotide frequencies in 10-kb sequences from human genome sequences identified oligonucleotides with frequencies characteristically biased from random occurrence level, and 10-kb sequences rich in these biased oligonucleotides were self-organized on the map. Because these oligonucleotides often corresponded to functional signal sequences (e.g. binding sites for transcription factors) or their constituent elements, we categorized occurrence patterns and frequencies of such pentanucleotides in the human genome that are thought to regulate transcription. SOM analysis is dependent only on oligonucleotide frequencies and thus applicable even for the sequenced genomes with little additional experimental data. In order to know TSS, experimental data were required, but to know start sites of protein-coding sequences, such data were not required in most cases. When known signal sequences of various species with enough experimental data are characterized systematically, we can develop an in silico method of signal sequence prediction for a wide range of species. Recently, we have developed a novel bioinformatics tool for phylogenetic classification of genomic sequence fragments derived from uncultured microorganism mixtures in environmental and clinical samples

内容記述

内容記述タイプ

Other

内容記述

科学研究費補助金研究成果報告書
研究代表者：池村淑道［総合研究大学院大学葉山高等研究センター教授］
研究分担者：深川竜郎［国立遺伝学研究所分子遺伝研究所助教授］
阿部貴志［国立遺伝学研究所生命情報DDBJセンター助手］
洞田慎一［総合研究大学院大学上級研究員］

科研費の分科・細目

進化生物学

発行日

日付

2006-05

日付タイプ

Created

研究課題番号

内容記述タイプ

Other

内容記述

16570190

報告年度

2004年度～2005年度

Versions

Ver.1

2023-06-20 15:30:04.656556

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

ゲノム塩基配列に潜む生物種の個性の情報学的探索と生物進化多様性の研究

× 池村, 淑道

× IKEMURA, Toshimich

Versions

Share

Cite as

エクスポート