SUBoptimal Weighted AlIgnment (SUBWAI)
SUBWAI is the protein structure prediction program based on threading strategy with SPAD, the error estimator, provided as supplementary material for paper:
Estimating Quality of Template-Based Protein Models by Alignment Stability. Hao Chen and Daisuke Kihara.
Copyright of this distribution belongs to Hao Chen & Daisuke Kihara. It's free for academic non-profit institutions. For commercial entities or government research labs, please contact us (firstname.lastname@example.org) to get the allowance of using this distribution. Redistribution of any files in this pack without our allowance is prohibited.
The current version of SUBWAI is packed in the tar file which contains the following files:
1aab-d1aab-1_20_1_1.seq, 1aab-d1aab-1_20_1_1.fa, 1aab-d1aab-1_20_1_1.sable, 1hme-d1hme-1_20_1_1.seq, 1hme-d1hme-1_20_1_1.fa, 1hme-d1hme-1_20_1_1.dssp, Readme.txt, example.pl, msa-1.pl, msasub-6.cpp, subwai, confidence.dat, confidence1.dat, confidence2-test.dat, confidence3.dat, output1-test.dat, pathmatrix-test.dat, sw.dat, 1aab-d1aab-1_20_1_1.profile, 1hme-d1hme-1_20_1_1.profile.
You can extract this package by gzip and tar under linux. Note: You have to correctly modify the setting in the script files according to your local environment before it works, so please carefully read the Usage section and example.pl. If you have any questions or suggestions, feel free to contact us (email@example.com). All rights reserved. This package will be updated based on the user feedback. Download Here.
1. After downloading the tar file, release the content from the package to your working directory for SUBWAI:
>gzip -d SPAD.tar.gz
>tar -xf SPAD.tar
2. Check your working directory. You will find two FASTA sequence files named: 1aab-d1aab-1_20_1_1.seq and 1hme-d1hme-1_20_1_1.seq. Here I will show you how to get the threading alignments between these two sequences and to calculate the SPAD (SuboPtimal Alignment Diveristy) value for the alignment. For your own work, you need to replace the example sequence files by your sequence files. You may need to change the following command a little bit to make the program work properly under your local environment.
3. Generate the profile for each example sequence file by PSIBLAST:
>/bio/liger3d6/CASP7/BLAST/blastpgp -d ../nr/nr -i 1aab-d1aab-1_20_1_1.seq -o 1aab-d1aab-1_20_1_1.profile -m 6 -j 5 -e 0.002 -h 0.002
>/bio/liger3d6/CASP7/BLAST/blastpgp -d ../nr/nr -i 1hme-d1hme-1_20_1_1.seq -o 1hme-d1hme-1_20_1_1.profile -m 6 -j 5 -e 0.002 -h 0.002
Here you need to use your PSIBLAST path. Don't change "-m 6 -j 5 -e 0.002 -h 0.002" and otherwise msa-1.pl in the next step might not work.
4. Generate the amino acid frequency from the profile:
You will get the frequence file named: 1aab-d1aab-1_20_1_1.fa and 1hme-d1hme-1_20_1_1.fa by these two commands, respectively.
5. Generate the SABLE secondary structure prediction for the target sequence (Here I suppose 1aab-d1aab-1_20_1_1.seq as the target sequence, so the structure of this sequence is unknown and you have to predict its seconday structure):
>cp 1aab-d1aab-1_20_1_1.seq /bio/liger3d6/CASP7/sable_distr/data.seq
Copy 1aab-d1aab-1_20_1_1.seq to your SABLE directory. Then change your current directory to SABLE directory and run:
Then SABLE will generate two output files: OUT_SABLE_graph and OUT_SABLE_res. Copy OUT_SABLE_graph to your SUBWAI directory:
cp /bio/liger3d6/CASP7/sable_distr/OUT_SABLE_graph /bio/liger3d8/chen177/test/time/1aab-d1aab-1_20_1_1.sable
And don't forget to change back your current directory to your SUBWAI directory.
6. Generate the secondary structure description file for the template sequence (1hme-d1hme-1_20_1_1.seq is the template sequence here. Its structure information is provided by 1hme-d1hme-1_20_1_1.pdb in this package):
>/bio/liger3/chen177/SubOptimal/dsspcmbi 1hme-d1hme-1_20_1_1.pdb 1hme-d1hme-1_20_1_1.dssp
You need to change the path to your DSSP directory here.
7. Compile SUBWAI:
>g++ msasub-6.cpp -o subwai
8. Now we have all files we need and will get the threading alignment and the SPAD by SUBWAI:
>./subwai -q 1aab-d1aab-1_20_1_1.seq -t 1hme-d1hme-1_20_1_1.seq -pq 1aab-d1aab-1_20_1_1.fa -pt 1hme-d1hme-1_20_1_1.seq -sq 1aab-d1aab-1_20_1_1.sable -st 1hme-d1hme-1_20_1_1.dssp -n test
Here the format of SUBWAI command is like the following:
subwai -q (The target sequence filename) -t (The template sequence filename) -pq (The frequency filename of the target) -pt (The frequency filename of the template) -sq (The SABLE prediction filename of the target) -st (The DSSP output of the template) -n (The arbitrary name affix for the output filename. It doesn't relate to the result and can be any word.)
All of input files, intermediate files and output files metioned here are provided in this tar package. Please check out.
The softwares mention here are available from the following links:
SUBWAI will generate many output files, including output1.dat pathmatrix.dat, confidence.dat, confidence1.dat,
confidence2.dat, confidence3.dat and sw.dat.
Output1.dat is a plain-text file to record all optimal and suboptimal target-template alignments. In this file, the first alignment is the best alignment, the second alignment has the second-highest alignment score and so forth. Confidence2.dat contains the SPAD value for each residue of the target sequence in the plain-text format: top one for the first residue, top second for the second residue and so on.
The other outputs are for development convenience and not necessary for final users.
For technical issues, please contact:
Hao Chen, ph.D. candidate
Department of Biology, Purdue University
West Lafayette, IN, 47907, USA
For getting the allowance, please contact:
Daisuke Kihara, Assistant Professor
Department of Biology and Department of Computer Science, Purdue University
West Lafayette, IN, 47907, USA