EMD 1.0 (Ensemble Motif Discovery) Algorithm


What is EMD?

EMD (Ensemble Motif Discovery) is an ensemble (consensus) algorithm that identifies one or more frequent motifs among multiple sequences. The basic idea is to combine motif predictions from multiple runs of multiple component algorithms to build consensus motifs as its prediction. In the current version, five component algorithms are included: AlignACE, BioProspector, MotifSampler, MDScan, and MEME. The former three are stochastic algorithms, while the latter two are deterministic algorithms. EMD 1.0 is a parallel program which requires PBS system to run the component motif discovery programs on multiple input sequences in parallel. This will greatly speed up the whole algorithm. Please cite reference [1] in your publication if EMD is used.

Reference:
[1] Hu J, Yang YD, Kihara D. EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences. BMC Bioinformatics. 2006 Jul 13;7:342.
[2] Hu J, Li B, Kihara D. Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 2005 Sep 2;33(15):4899-913. Print 2005.


How to install EMD?

Download EMD.tar.gz
Uncompress EMD.tar.gz into the installation directory
Add the EMD directory to the system path
Add the EMD directory to the PERL library path


How to run EMD?

emdrunPX.pl runs all the component algorithms on the input data sets(./step5regulon). The number of component algorithms, the number of runs of each algorithm, and their command line options are all specified in a configuration file, e.g., emd.cfg
emdMotif.pl combines the results from multiple component algorithms and output the consensus result.

In the EMD directory:
1) $mkdir test
2) $cd test
3) $mkdir step5regulon
4) create input files in step5regulon/, ( e.g. cp ../input/Ada.txt step5regulon/)
5) $cp ../runmotif.sh . #template PBS job file
6) $cp ../emd.cfg . #configuration file of EMD algorithm
6) $cp ../bg_seq/*.bg . #background files of component algorithms
7) $emdrunPX.pl -f emd.cfg -w 15 -n 5
8) $emdMotif.pl -f step5regulon/Ada.txt -c emd.cfg -n 5


Example Output

Motif 0:
GACTTGTAAACCTAA 0 21 15
GACTTGTAAACCAAA 1 20 15
TTACAAGTCTACACC 1 51 15

Motif 1:
ATTCGGTGTAGACTT 0 11 15
TTTACAAGTCGATTA 0 51 15
TTTAGGTTTACAAGT 1 44 15

Motif 2:
AGACTTGTAAACCTA 0 20 15
CGACTTGTAAACCAA 1 19 15
TTTACAAGTCTACAC 1 50 15

Motif 3:
ACTTGTAAACCTAAA 0 22 15
ACTTGTAAACCAAAT 1 21 15
AAGTCTACACCGAAT 1 55 15

Motif 4:
CTTGTAAACCTAAAT 0 23 15
AACCAAATTGAAAAG 1 28 15
AAGTCTACACCGAAT 1 55 15


Contact Information

Daisuke Kihara (Principal Investigator)
Lilly Bld. B235
Department of Biological Sciences
Purdue University
West Lafayette, IN, 47906
Tel: 765-494-2744
Email:dkihara@purdue.edu

Yifeng Yang (Graduate Student)
Lilly Bld. B207
Department of Biological Sciences
Purdue University
West Lafayette, IN, 47906
Tel: 765-494-6766
Email:yang41@purdue.edu