Models to be considered
A summary on models of evolution: ftp://statgen.ncsu.edu/pub/thorne/mypapers/thornecurrop.pdf
Modeling sequence evolution: http://www.springerlink.com/content/g7p0382462jl1816/fulltext.pdf
1. Insertions and deletions (ID)
http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1087829&blobtype=pdf
http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.1000172
They have used a model. Skimmed through the article, need to read it.
2. Amino acid replacements (AAR)
- We have already implemented an amino acid replacement model based on BLOSUM62.
- We should look at Dayoff Model
3. Variation in evolutionary process based on the sites (VEP)
I think the idea is to have a rate of evolution based on the amino acid residues in protein. Couldn't find any implementable models for this yet.
4. Codon Based Models (CBM)
- A review on codon based models: http://bib.oxfordjournals.org/cgi/reprint/bbn049v1
Evaluation of the models
Dataset
- Use the SCOP based dataset
Approach
- We want to augment the training set with simulating sequences generated from four evolutionary models: ID, AAR, VEP and CBM).
- We have several parameters to consider
- ratio of original to simulated sequences
- performance of original HMMS, or HMMS trained without simulated sequences. It may be possible to build an HMM for certain SCOP superfamily with 90% accuracy and with on 40% accuracy for another SCOP superfamily
- proportion of sequences with each model of evolution