X

Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang

Engineering Library

 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Saadedin
    Thread Author
    Administrator
    • Sep 2018 
    • 35987 
    • 18,820 
    • 2,851 

    Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang







    Preface

    An ultimate goal of modern biology is to understand how the genetic blueprint of

    cells (genotype) determines the structure, function, and behavior of a living organism

    (phenotype). At the center of this scientific endeavor is characterizing the biochemical

    and cellular roles of proteins, the working molecules of the machinery of life. A

    key to understanding of functional proteins is the knowledge of their folded structures

    in a cell, as the structures provide the basis for studying proteins’ functions

    and functional mechanisms at the molecular level.



    Researchers working on structure determination have traditionally selected individual

    proteins due to their functional importance in a biological process or pathway

    of particular interest. Major research organizations often have their own protein

    X-ray crystallographic or/and nuclear magnetic resonance facilities for structure determination,

    which have been conducted at a rate of a few to dozens of structures a

    year. Realizing the widening gap between the rates of protein identification (through

    DNA sequencing and identification of potential genes through bioinformatics analysis)

    and the determination of protein structures, a number of large scientific initiatives

    have been launched in the past few years by government funding agencies in

    the United States, Europe, and Japan, with the intention to solve protein structures

    en masse, an effort called structural genomics. A number of structural genomics

    centers (factory-like facilities) have been established that promise to produce solved

    protein structures in a similar fashion to DNA sequencing. These efforts as well as

    the growth in the size of the community and the substantive increases in the ease

    of structure determination, powered with a new generation of technologies such as

    synchrotron radiation sources and high-resolution NMR, have accelerated the rate

    of protein structure determination over the past decade. As of January 2006, the

    protein structure database PDB contained ∼34,500 protein structures.



    The role of structure for biological sciences and research has grown considerably

    since the advent of systems biology and the increased emphasis on understanding

    molecular mechanisms from basic biology to clinical medicine. Just as every

    geneticist or cell biologist needed in the 1990s to obtain the sequence of the gene

    whose product or function they were studying, increasingly, those biologists will

    need to know the structure of the gene product for their research programs in this

    century. One can anticipate that the rate of structure determination will continue to

    grow. However, the large expenses and technical details of structure determination

    mean that it will remain difficult to obtain experimental structures for more than a

    small fraction of the proteins of interest to biologists. In contrast, DNA sequence

    determination has doubled routinely in output for a couple of decades. The genome

    projects have led to the production of 100 gigabytes of DNA data in Genbank, and

    as the cost of sequencing continues to drop and the rate continues to accelerate, the

    scientific community anticipates a day when every individual has the genes of their

    interest and the genomes of all related major organisms sequenced.



    Structure determination of proteins began before nucleic acids could be sequenced,

    which nowappears almost ironic. As microchemistry technologies continue

    to mature, ever more powerful DNA sequencing instruments and new methods for

    preparation of suitable quantities of DNA and cheaper, higher sequencing throughput,

    while enabling a revolution in the biological and biomedical sciences, also left

    structure determination way behind. As sequencing capacity matured in the last few

    decades of the twentieth century, DNA sequences exceeded protein structures by

    10-fold, then 100-fold, and now there is a 1000-fold difference between the number

    of genes in Genbank and the number of structures in the PDB. The order of magnitude

    difference is about to jump again, in the era of metagenomics, as the analyses of

    communities of largely unculturable organisms in their natural states come to dominate

    sequence production. The J. Craig Venter Institute’s Sargasso Sea experiment

    and other early metagenomics experiments at least doubled the number of known

    open reading frames (ORFs) and potential genes, but the more recent ocean voyage

    data (or GOS) multipled the number on the order of another 10-fold, probably more.

    The rate of discovery of novel genes and correspondingly novel proteins has not

    leveled off, since nearly half of new microbial genomes turn out to be novel. Furthermore,

    in the metagenomics data, new families of proteins are discovered directly

    proportional to the rate of gene (ORF) discovery.







    Download

    *


Working...
X