HUM-MOLGEN -> Genetic News | search

A Computational Approach To Extracting New Information From MEDLINE

May, 6 2001 20:54

your information resource in human molecular genetics

Mining the biobibliome

The National Library of Medicine's MEDLINE citation database, the world's largest searchable collection of biomedical literature, has, since 1966, accumulated more than 11 million titles and abstracts from articles published in over 4,000 relevant journals. As a tool for retrieving information about a particular gene or protein, it is unsurpassed. As a tool for discovering new connections between particular genes and biological processes, it is also arguably the world's most underexploited (in silico) repository of data. A group led by Eivind Hovig (of The Norwegian Radium Hospital, Oslo, Norway) has now outlined a computational approach to extracting new information from this massive archive (Nature Genetics, Vol. 28, No. 1, 01 May 2001). Conducted on a large scale across the entire database, the analysis generates networks of related genes that reveal heretofore-unknown aspects of biology.

A similar, if small-scale, approach was published last year by Benjamin Stapley and Gerald Benoit (of the University of Kentucky), who coined the term "biobibliometrics." The basic assumption is that genes that are mentioned in the same abstract are likely to have a biological relationship. By analogy to global approaches to understanding the genome, transcriptome and proteome, Hovig and colleagues have now searched the titles and abstracts of over 10 million MEDLINE citations-the 'biobibliome'-to produce a "gene-to-gene co-citation network" for 13,712 known human genes. By annotating this network with biological attributes such as medical subject heading (MeSH) terms, the authors have identified meaningful biological relationships between sets of genes that, though subsequently validated by experiment, had not been predicted. The computational tools to carry out these analyses have been deposited in a publicly available database called PubGene ( www.PubGene.org). PubGene provides an opportunity to harvest at least some of the collective wisdom-as yet unrealized-that has been produced by thousands of scientists over the last 35 years.

Though powerful, the method by Hovig and colleagues is limited by difficulties in dealing rationally and systematically with the flood of information entering the literature. Many of these problems are of long-standing concern, including inconsistencies in nomenclature, the inaccessibility of the full text of most published articles, and the sheer complexity of biology itself. These issues are discussed in an accompanying News & Views article by Daniel Masys (of the University of California, San Diego), and in this month's Nature Genetics editorial.

CONTACT:

Dr. Eivind Hovig
The Norwegian Radium Hospital
Oslo - Norway
Telephone: +47 2293-5416
Fax: +47 2252-2421
Email: ehovig@radium.uio.no

Dr. Daniel Masys
University of California San Diego
La Jolla, California - USA
Telephone: +1 858-534-6573
Email: dmasys@ucsd.edu

(C) Nature Genetics press release.

Message posted by: Trevor M. D'Souza

Latest News

Variants Associated with Pediatric Allergic Disorder

Mutations in PHF6 Found in T-Cell Leukemia

Genetic Risk Variant for Urinary Bladder Cancer

Antibody Has Therapeutic Effect on Mice with ALS

Regulating P53 Activity in Cancer Cells

Anti-RNA Therapy Counters Breast Cancer Spread

Mitochondrial DNA Diversity

The Power of RNA Sequencing

‘Pro-Ageing' Therapy for Cancer?

Niche Genetics Influence Leukaemia

Molecular Biology: Clinical Promise for RNA Interference

Chemoprevention Cocktail for Colon Cancer

more news ...