Articles / Bioinformatics: Sequence an…

Bioinformatics: Sequence and Genome Analysis

I first met this book at a bioinformatics course I attended at NCSU last year. I've been reading books on bioinformatics since 1997, and I was a little skeptical about this one. I thought it was "just another bioinformatics book". I was wrong. It has some really outstanding features I'd like to highlight.

Contents

  • Preface
  • Chapters:
    1. Historical Introduction and Overview
    2. Collecting and Storing Sequences in the Laboratory
    3. Alignment of Pairs of Sequences
    4. Multiple Sequence Alignment
    5. Prediction of RNA Secondary Structure
    6. Phylogenetic Prediction
    7. Database Searching for Similar Sequences
    8. Gene Prediction
    9. Protein Classification and Structure Prediction
    10. Genome Analysis
  • Glossary
  • Index

Strong points

Paradoxically, one of the merits of the book is something that is not in its content: There is no tutorial on how to surf the Web or how to download files using FTP. You are supposed to already have basic skills.

Bioinformatics is a moving target, and updated information is a must. This book fulfills this requirement very well. Coverage of the eMOTIF method of motif analysis, HMMER, PHI-BLAST, and CASP3 contests are examples of how up-to-date this book is.

The book is OS agnostic; you won't find a whole chapter devoted to "bioinformatics software for the Macintosh". The most representative programs in each area are introduced, regardless of the OS on which they are used. It seems that the author takes for granted that every research lab has at least one *nix box, a Windows machine, and a good Internet connection (some of the programs are Web applications). There is one inexplicably missing package: EMBOSS, the European Molecular Biology Open Source Software is not even mentioned, though it is well-known that the tools and databases used in the U.S. are different from the ones used in Europe.

The book has everything you really need to know. The underlying algorithms and assumptions and some limitations on their use are clearly explained. No particular software is explained[1]; if you need it, read the fine manual. Why waste printed pages on man/help output, if it's easy to get? The main algorithms are thoroughly explained, without falling into complex mathematical formulas. If you really want to dig into the mathematic complexity, the book has plenty of references.

The extensive tables provide Web site links and references to select resources. They deserve a specific index for themselves, turning this book into a valuable reference manual.

There are a lot of flowcharts which work as a procedure manual or as a cheat sheet for an exam (at least to have a glance at "the big picture"). I dare to say that there is no bioinformatics book with such a feature. This is very important, since most bioinformatics work consists of managing a continuous data flow and making decisions regarding what to do next with your data. As a programmer, I particularly love this feature.

Protein analysis is usually a weak point in books not dealing directly with the topic. Chapter 9, "Protein Classification and Structure Prediction", has almost 100 pages devoted to methods to get more structural information about your protein than you thought was possible. The figures in this chapter are very well chosen, and good illustration is a must to understand complex 3D patterns like these.

The explanations are so conceptual that you can learn the fundamentals of Bayesian Statistics in just three pages. The level is fairly adequate; it's not a book that you can take to the beach and casually read while sunbathing. It is college-level material, so you have to take your time. This does not mean that the author makes topics unnecessarily difficult; he just treats these topics the way they deserve to be treated.

Drawbacks

In the chapter on sequencing and databases (chapter 2), the author should have mentioned solutions for local databases. When making reference to ACEDB, for example, he should have pointed out that it is not only a public database, but a program which allows you to locally consult your own database.

Unfortunately, there is no chapter with a section which explains the application of the methods and techniques described. The only chapter with an application section is chapter 5, "Prediction of RNA Secondary Structure", and the section is incomplete and at the end. The incompleteness comes from the fact that it does not mention one of the hottest applications related to prediction of RNA secondary structure: ribosome design, enzymes based on RNA sequences, capable of cutting specific sites in the RNA, which could very useful against RNA viruses such as HIV, SARS, etc. It is my opinion that such a section should be at the beginning of each chapter to stimulate the reader. I also believe that the author should not deal with such hard topics without mentioning their applications. Would you learn assembler if you didn't know that the programs made with it will run faster and with low level access to hardware?

Biologists looking for a reference manual to help them with their daily work at a wet lab will be disappointed because there is nothing about primer design, indel, or SNP-based probe searching.

Programmers will miss a chapter or two with basic concepts of molecular biology. Still, you can't rely just on a book chapter to get all the biology you need to work on bioinformatics. If you want to be serious about this, my recommended course of action would be to read a beginner's book like Biology by Helena Curtis and/or take a 4-to-6-month introductory biology course at your local college.

Bioinformaticsonline.com

A couple of lines about the associated Web site:

There is a sticker on the inner front cover with an access code which works as a ticket to Bioinformaticsonline.com. On the site, there is "supplemental information" which wouldn't fit in the book, like an in-depth explanation of how to set gap openings and gap extension scores with different programs. There is a set of very useful links and problem sets for classroom use for each chapter (though the provided solutions are plain, with no explanation or justification of answers).

Conclusions

The author, a professor of Bioinformatics at the University of Arizona, focuses on teaching the methods of sequence and structure analysis. He accomplish his goal pretty well. I highly recommend this book, not for a wet lab scientist or for a hardcore programmer with no biology background, but for anyone who wants to have an in-depth knowledge of what is behind bioinformatics software. You may end up wanting more, but I'm sure you won't regret buying this book.

1. BLAST and ClustalX are exceptions, since they are heavily used in this field.

Recent comments

05 Aug 2005 07:47 Avatar saminathan

Query
Hi Sebastian,

Am a software programmer/Engineer with Master degree in Chemistry (From India).I would like to turn my carrier in to Bio informatics.Where should i start.I should have very good knowledge on Biochemistry since i have a master degree in Chemistry dicipline.

Would you recommed some starting path to get into bio informatics carrier

Thanks and Regards

Saminathan.

01 Jul 2004 02:13 Avatar AndrewCates

Purchase URL for UK readers
Just to add the link above is US purchase; if anyone in the UK wants to buy the book an alternative URL is WHSmiths (http://www.whsmith.co.uk//whs/Go.asp?isbn=0879696087&shop=26985)

I should add that I don't work for WH Smiths or anything but I do buy stuff online from them.

BozMo (http://catesfamily.org.uk/)

26 Nov 2003 04:50 Avatar Kopivs

Re: bioinformatics
I'm not only a non-biologist, but I'm also a non-programmer, however, I've taken an interest in bioinformatics on the basis of a hunch.


The hunch is that biology and biotechnology is a truly immense area with an overabundance of information, which could be useful if it could be correlated in efficient ways. That's also what happens daily on the internet, so this predicament is shared by other areas too.


Trouble is, in biology & medecine, there are large information gaps too, which in some cases could save lives and relieve suffering. This prerogative means that in terms of knowledge and intelligence processing, bioinformatics has got to be a leader, like, by obligation.


So, if you are trying to convert raw information into knowledge and intelligence, bioinformatics is the place to look for models.


Moreover, biology probably suffers a much lower percentage of disinformation, due to the serious nature of the applications, while in enterprise IT and telecomms, well, disinformation is pretty much expected.


Correct me if I'm way off course ...

17 Sep 2003 11:35 Avatar srlasky

Re: bioinformatics

> I haven't met a competent computer
> literate biologist EVER ... nor have i
> met a vice-versa competent programmer
> that could understand fundamentals of
> biology .. and it's applications/theory
> through medicine ...

It would be easy to suggest that you are simply looking in the wrong places for people with these cross-disiplinary skills because there are plenty of them where I work (the Institute for Systems Biology). However, in general you are right, and this is going to be a huge pboblem if it isn't addressed.

Luckily, several institutions such as the ISB, and now several Universities like MIT, Harvard, UCB, UCSD, and others are seriously looking at changing what it takes to get a degree in biology, and these new programs will stress the advantages that can be gained through strong interdisciplinary studies.

I think that you will see a great change in what your average biologist looks like in the coming years.

srlasky

06 Aug 2003 06:11 Avatar sbassi

Re: bioinformatics
Hello,

> what do you do with bioinformatics ...


I do a lot of things:

1- Look for biological meaning on new found DNA sequences. At the company I work in, we have a DNA sequencer machine, when we get a new sequence (almost everyday) I have to analyse it using very specific bioinformatic software to search for function and mutations.

2- Design primers for PCR analysis. That is useful for people on the lab.

3- Mantain DNA and protein sequence database.

> what CAN a computer programmer DO w/
> bioinformatics ...
> and ... is the same context... what CAN
> a biologist DO w/ a computer program ...


There a lot things that could be done. The take an insight of that, I suggest you 2 websites: www.bioinformatics.org and bioinformatica.info (last one is my own :)


> I haven't met a competent computer
> literate biologist EVER ... nor have i
> met a vice-versa competent programmer
> that could understand fundamentals of
> biology .. and it's applications/theory
> through medicine ...


It's very hard to find people with proefency in Bio and CompSci. But look at the people who made EMBOSS (http://www.hgmp.mrc.ac.uk/Software/EMBOSS/) or the people at BioPERL, BioPython, BioJAVA and other Bio* projects. They are very skilled on both things.

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.