UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Sensitive and rapid methods for comparing and searching biological sequence data

Hatwell, John Norman; (2001) Sensitive and rapid methods for comparing and searching biological sequence data. Masters thesis (M.Phil), UCL (University College London). Green open access

[thumbnail of Sensitive_and_rapid_methods_fo.pdf] Text
Sensitive_and_rapid_methods_fo.pdf

Download (6MB)

Abstract

Sequence database searching is a key tool in current bioinformatics. To improve accuracy, sequence database searches are often performed iteratively: taking the results of one search as input for the next. The object of this approach being to progressively isolate increasingly distant relations of the original query sequence. In practice this method works well when it is supervised by an 'expert eye' which can determine when an alignment is good and when sequences should be excluded from it, but attempts to automate this process have proven difficult. At present PSI-BLAST is one of the few effective attempts, but a misalignment of sequences or the wrongful inclusion of a sequence will still rapidly destroy the specificity of the probe, making incorrect matches more likely. By combining the search program Quest, which is capable of searching a database using full length multiple sequence alignments, with independent sequence alignment and assessment programs, we have been able to reduce the occurrence of this problem. We use a multiple alignment package to generate an accurate alignment of all hits generated by the Quest program. Sequences that do not appear to 'fit' with the rest of the alignment are automatically removed by the separate alignment assessment program Mulfil. The resulting alignment is fed back to Quest for the next iteration. This scheme has shown to generate results significantly better than those of PSI-BLAST. Whilst the total number of correct homologues identified was not increased, the number of incorrect ones dropped significantly. In addition, further work demonstrated that equally good quality results are possible without the use of multiple alignment or profile searching. The Cascade-and-Cluster scheme uses intermediate sequences and a simple clustering procedure and is able to produce a result almost equally sensitive and selective as our previous scheme, whilst running upto ten-fold faster.

Type: Thesis (Masters)
Qualification: M.Phil
Title: Sensitive and rapid methods for comparing and searching biological sequence data
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
Keywords: Biological sciences; Biologicial sequence data
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10097929
Downloads since deposit
1,148Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item