Algorithm
GenPortrait is designed to view the "portrait of a genome". A prominent fractal-like patterns are observed in these portraits, which is specific to each genome. The pattern of a genome is quite different from that of a random sequence and similiar species show a similiar pattern. The method counts the frequencies of short n-length DNA sequences in an input genome and store in a 2D matrix. The matrix can be then visualized in a gray scale or in a color scale.
For example, when the oligo nucelotide length is set to 2, the frequencies of AA, AC, AG, AT, .,,and TT is counted by shifting a window of the size of 2. This will result in in total of 16 counts, each of which will be stored in a 2D matrix in a following way:
| AA | AC | CA | CC |
| AT | AG | CT | CG |
| TA | TC | GA | GC |
| TT | TG | GT | GG |
5 color scale are available to visualize the matrix: JET, HSV, COOL, SPRING, and gray scale.
| Scheme | Mapping colors |
| GRAY | ![]() |
| JET | ![]() |
| HSV | ![]() |
| COOL | ![]() |
| SPRING | ![]() |
The pictures below are the portraits of e.coli (128*128) with length =
7 (2^7=128). The potrait is generated in four color scales.
Please go to Examples to see more examples.
![]() |
![]() |
![]() |
![]() |
| HSV | JET | COOL | SPRING |
is based on the sum of differences of the frequency of each oligo-nucleotides. First, counts of oligo-nucleotides in each portrait are normalized by deviding the counts with the average counts in the portrait. Next, the absolute value of the difference of the normalized counts of the same oligo-nucleotide in the two portraits is computed and then all of them are summed up. The distance may be slightly different with different oligo-nucleotide length.
GenPortrait Database
We have a database of portraits of genomes. The genome sequences are RefSeq sequences
speficied in the KEGG organism list.
They are downloaded from the NCBI ftp site.
Only the files with the name, NC_*.fna (most of them are complete genomes) are included. RNA sequences, sequences of
less than 30Kb are ommitted. Currently there are 618 genomes.
You can excute a search against this database from your input sequence.
Multiple sequences in a file
If an input file contains multiple fasta format sequences,
frequencies of oligo-nucleotides are counted for each sequence and
the server still generate portraits. Note that sequences are not concatenated.
This may be useful to capture characteristics of a population of fragment sequences
(,which may be taken from a metagenomics project).
What you can do
- Make a portrait for your sequence file. here
- Generate the portrait of multiple files and compute the distance between them. First, submit sequences from home , and go to Comparison to compute the distance between them.
- Compare the portrait of an input genome against our genome database. Submit the sequence from home and press "Query" button which will appear below the portrait.
- View precomputed portraits of 618 genomes Examples
- Download the source codes and executable program and example genome sequences Downloads
- Contact us for analyzing a large number of genomes or collaboration :) Contact Us
Tutorial
If you don't have a nucleotide sequence to analyze, you can select a sequence from Downloads to download it onto your local disk. Then try to upload it from Home to excecute GenPortrait.Or you can see our examples of the portrait in our database under Examples menu. You can further query the seleced portrait against the database to find genome sequences of the similar portrait.








