Hatwell, John Norman;
(2001)
Sensitive and rapid methods for comparing and searching biological sequence data.
Masters thesis (M.Phil), UCL (University College London).
![]() |
Text
Sensitive_and_rapid_methods_fo.pdf Download (6MB) |
Abstract
Sequence database searching is a key tool in current bioinformatics. To improve accuracy, sequence database searches are often performed iteratively: taking the results of one search as input for the next. The object of this approach being to progressively isolate increasingly distant relations of the original query sequence. In practice this method works well when it is supervised by an 'expert eye' which can determine when an alignment is good and when sequences should be excluded from it, but attempts to automate this process have proven difficult. At present PSI-BLAST is one of the few effective attempts, but a misalignment of sequences or the wrongful inclusion of a sequence will still rapidly destroy the specificity of the probe, making incorrect matches more likely. By combining the search program Quest, which is capable of searching a database using full length multiple sequence alignments, with independent sequence alignment and assessment programs, we have been able to reduce the occurrence of this problem. We use a multiple alignment package to generate an accurate alignment of all hits generated by the Quest program. Sequences that do not appear to 'fit' with the rest of the alignment are automatically removed by the separate alignment assessment program Mulfil. The resulting alignment is fed back to Quest for the next iteration. This scheme has shown to generate results significantly better than those of PSI-BLAST. Whilst the total number of correct homologues identified was not increased, the number of incorrect ones dropped significantly. In addition, further work demonstrated that equally good quality results are possible without the use of multiple alignment or profile searching. The Cascade-and-Cluster scheme uses intermediate sequences and a simple clustering procedure and is able to produce a result almost equally sensitive and selective as our previous scheme, whilst running upto ten-fold faster.
Type: | Thesis (Masters) |
---|---|
Qualification: | M.Phil |
Title: | Sensitive and rapid methods for comparing and searching biological sequence data |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
Keywords: | Biological sciences; Biologicial sequence data |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10097929 |
Archive Staff Only
![]() |
View Item |