MAINMAST

MAINMAST is a de novo modeling protocol to build an entire protein 3D model directly from near-atomic resolution EM map.
MAINMAST is a fully automated protocol and can generate reliable initial C-alpha models which can be used to construct full atomic models. This new de novo modeling method has several advantages; (1) It does not require reference structures; (2) It does not requre manual interventions; (3) a pool of candidate models are generated.

Introduction

MAINMAST protocol

MAINMAST protocol consists of mainly four steps:

(1) Identify local dense points (LDPs) by Mean Shifting Algorithm;

(2) Connect all LDPs by Minimum Spanning Tree;

(3) Refine Tree structure by Tabu Search algorithm;

(4) Thread sequence on the longest path.

Tutorial

Protocol

MAINMAST is a de novo modeling method for EM maps of near atomic resolution (less than 4.0 angstrom)

MAINMAST protocol consists of mainly four steps: (1) Identify local dense points in an EM map by Mean Shifting clustering algorithm; (2) Connect all LDPs by Minimum Spanning Tree; (3) Refine Tree structure by Tabu Search algorithm; (4) Thread sequence on the longest path.

Program MAINMAST will do the (1)-(3) steps.

Program ThreadCA threads the amino acid sequence on the longest path in the final step.

Flow Chart of MAINMAST

1. Identifying local dense points

Each Grid points in a EM map are clusterized by a non-parametric clustering algorithm (Mean shift). After the clustering, the representative points in the clusters are called local dense points (LDPs).

Map and LDPs

2. Connect all LDPs by Minimum Spanning Tree

Minimum Spanning Tree is a graph structure that connects all vertices with the minimum total length of edges.

Map and LDPs

3. Refine Tree structure by Tabu Search algorithm

The tree structure is further improved for finding the protein main-chain path. The initial tree structure (MST) is refined in an iterative procedure using a tabu search. A tabu search attempts to explore a large search space by keeping a list of moves that are forbidden.

Predicted path

4. Thread sequence on the longest path

The longest path of a tree is aligned with the amino acid sequence using the Smith-Waterman Dynamic Programming algorithm.

Predicted C-alpha model

Commands

Commands

MAINMAST protocol consists of two commands (MAINMAST and ThreadCA).

MAINMAST

This command identify local dense points (LDPs) in an EM map. Then LDPs are connected by Minimum Spanning Tree (MST). The MST is refined by a tabu search algorithm.

Usage: MAINMAST -m [MAP file (situs format)] (option) Option ver2.0: -Tree : Show MSTree mode -Graph : Graph mode ---Parameters in MeanShift---- -gw [f] : bandwidth of the gaussian filter def=2.0, sigma = 0.5*[float] -Dkeep [f] : Keep edge where distance < [f] def=0.5 -t [f] : Threshold of density values. def=0.0 -allow [f] : Max shift distance < [f] def=10.0 -filter [f]: Filter of representative points def=0.1 -merge [f]: After MeanShifting, merge d<[f] def=0.5 ---Parameters in Tabu-search---- -Nround [i]: Number of Iterations def=5000 -Nnb [i]: Number of Neighborss def=30 -Ntb [i]: Size of tabu-list def=100 -Rlocal [f]: Radius of Local MST def=10 -Const [f]: Constraint of total length of edge def=1.01,Total(Tree) <[f]*Total(MST)

ThreadCA

ThreadCA determine the direction of the protein by threading the amino acid sequence of the protein to the longest path in the refined tree graph.

Usage: ThreadCA -i [OUT file from MAINMAST] -a [20AA.param] -spd [*.spd3] (option) Option ver1.0: -i [file] : Result file of MAINMAST -a [file] : 20AA.param -spd [file] : Result of SPIDER2 -fw [f] : Filter width def=1.0 -Ab [f] : Average length of CA-CA Bond def=3.5 -Wb [f] : Weight of Bond score def=0.9 -r : Reverse mode, reverse mainchain order

Examples

Example1: Simulated map at 5.0 angstrom resolution

(Optional) Prepare a map file (1yfq.situs) as an input from a PDB file (1yfq.pdb)

Generate simulated map from 1yfq.pdb by e2pdb2mrc.py EMAN2 package.

e2pdb2mrc.py 1yfq.pdb 1yfq_2.mrc --res 5.0

Convert MRC format file to SITUS format by map2map SITUS package.

echo 2|map2map 1yfq.mrc 1yfq.situs

Density Map, 1yfq.mrc

Trace main-chain from MAP

Trace main-chain from MAP (1yfq.situs) by MAINMAST. Predicted main-chain paths were saved into path.pdb.

MAINMAST -m 1yfq.situs -t 9 -filter 0.3 -Dkeep 1.0 -Ntb 10 -Rlocal 5 -Nlocal 50 -Nround 50 > path.pdb

(Optional) Visualize main-chain path in pymol. bondmk.pl makes the pymol script.

bondmk.pl path.pdb > tmp
pymol -u tmp

Predicted Main-chain Path

Thread the amino acid sequence on the longest path

ThreadCA requires output file (*.spd3) from SPIDER2. Predict Secondary Structures by SPIDER2

run_local.sh 1yfq.seq

Thread the sequence on the main-chain path (required: path.pdb, 20AA.param and 1yfq.spd3).

../ThreadCA -i path.pdb -a ./20AA.param -spd 1yfq.spd3 -fw 1.3 -Ab 3.3 -Wb 0.9 >CA.pdb
../ThreadCA -i path.pdb -a ./20AA.param -spd 1yfq.spd3 -fw 1.3 -Ab 3.3 -Wb 0.9 -r >CA_r.pdb

Compare CA.pdb and CA_r.pdb by threading scores. In this case, CA.pdb shows better threading score than CA_r.pdb

Threading Score of CA.pdb is 108.092064

MODEL           1   108.092064      Wh=   1.00000000      We=  0.8
  
ATOM      1  CA  MET A   1      52.643  15.257  36.319  1.00  1.00
ATOM      2  CA  GLN A   2      54.566  17.208  34.977  1.00  1.00
ATOM      3  CA  ILE A   3      56.169  17.272  32.172  1.00  1.00
ATOM      4  CA  VAL A   4      57.479  18.569  30.500  1.00  1.00
  

Threading Score of CA_r.pdb is -13.4560547

 MODEL           1  -13.4560547      Wh=   1.29999995      We=  0.8
  
ATOM      1  CA  MET A   1      40.354  14.523  39.657  1.00  1.00
ATOM      2  CA  GLN A   2      41.278  16.075  39.879  1.00  1.00
ATOM      3  CA  ILE A   3      42.087  16.872  38.689  1.00  1.00
ATOM      4  CA  VAL A   4      44.132  18.503  37.508  1.00  1.00
  
Precidted CA model(CA.pdb)

Precidted CA model(CA_r.pdb)

Green: CA model, Red: 1yfq.pdb

(Optional )Visualize the Minimum Spanning Tree

Generate MST on the EM map. bondmk.pl makes the pymol script.

MAINMAST -m 1yfq.situs -t 9 -filter 0.3 -Dkeep 1.0 -Ntb 10 -Rlocal 5 -Nlocal 50 -Nround 50 -Tree > tree.pdb

bondtree.pl tree.pdb > tmp
pymol -u tmp

Minimum Spanning Tree

(Optional )Visualize All possible paths

Generate all possible connections(edges) on the EM map. bondmk.pl makes the pymol script.

MAINMAST -m 1yfq.situs -t 9 -filter 0.3 -Dkeep 1.0 -Ntb 10 -Rlocal 5 -Nlocal 50 -Nround 50 -Graph > graph.pdb

bondtree.pl graph.pdb > tmp
pymol -u tmp

All edges

Example2: A segmented map from EMD-6374

(Optional) Prepare a map file (6374.situs)

Prepare a segmented map (6374.mrc) by Chimera.
Convert MRC format file to SITUS format by map2map SITUS package.

echo 2|map2map 6374.mrc 6374.situs

Density Map, 6374.mrc

Trace main-chain from MAP

Trace main-chain from MAP (6374.situs) by MAINMAST. Predicted main-chain paths were saved into path.pdb.

MAINMAST -m 6374.situs -t 1.00 -filter 0.3 -Rlocal 10 > path.pdb

(Optional) Visualize main-chain path in pymol. bondmk.pl makes the pymol script.

bondmk.pl path.pdb > tmp
pymol -u tmp

Predicted Main-chain Path

Thread the amino acid sequence on the longest path

ThreadCA requires a output file (*.spd3) from SPIDER2. Predict Secondary Structures by SPIDER2

run_local.sh 6374.seq

Thread the sequence on the main-chain path (required: path.pdb, 20AA.param and 6374.spd3).

ThreadCA -i path.pdb -a ./20AA.param -spd 6374.spd3 -fw 1.4 -Ab 3.4 -Wb 0.9 > CA.pdb

ThreadCA -i path.pdb -a ./20AA.param -spd 6374.spd3 -fw 1.4 -Ab 3.4 -Wb 0.9 -r > CA_r.pdb

Compare CA.pdb and CA_r.pdb by threading scores. In this case, CA_r.pdb shows better threading score than CA_r.pdb

Threading Score of CA.pdb is -30.2037010

 MODEL           9  -30.2037010      Wh=   1.10000002      We=   1.0
ATOM      1  CA  MET A   1    -106.996 -28.676 264.784  1.00  1.00
ATOM      2  CA  LEU A   2    -105.789 -31.728 266.138  1.00  1.00
ATOM      3  CA  GLN A   3    -106.515 -31.803 269.158  1.00  1.00
ATOM      4  CA  GLN A   4    -104.319 -32.575 272.131  1.00  1.00
  

Threading Score of CA_r.pdb is 7.75097609E-02

 MODEL           1   7.75097609E-02  Wh=   1.20000005      We=   1.0  
  
ATOM      1  CA  MET A   1    -106.996 -28.676 264.784  1.00  1.00
ATOM      2  CA  LEU A   2    -105.789 -31.728 266.138  1.00  1.00
ATOM      3  CA  GLN A   3    -106.515 -31.803 269.158  1.00  1.00
ATOM      4  CA  GLN A   4    -104.319 -32.575 272.131  1.00  1.00
  
Precidted CA model(CA.pdb)

Precidted CA model(CA_r.pdb)

Green: CA model, Red: 1yfq.pdb

(Optional )Visualize the Minimum Spanning Tree

Generate MST on the EM map. bondmk.pl makes the pymol script.

MAINMAST -t 1.00 -filter 0.3 -Rlocal 10 -Tree -m 6374.situs > tree.pdb

bondtree.pl tree.pdb > tmp
pymol -u tmp

Minimum Spanning Tree

(Optional )Visualize All possible paths

Generate all possible connections(edges) on the EM map. bondmk.pl makes the pymol script.

MAINMAST -t 1.00 -filter 0.3 -Rlocal 10 -Graph -m 6374.situs > graph.pdb

bondtree.pl graph.pdb > tmp
pymol -u tmp

All edges

Download

Free Server (Recommended): https://em.kiharalab.org/algorithm/mainmast

Download the latest version of MAINMAST programs from MAINMAST.tgz

  • Threading part of MAINMAST requres the SPIDER2 (sequence based secondary structure prediction program).
    Download from Github
  • Chimera plugin now available at Github

Updates of MAINMAST programs:

  • 2017 3/1 Released Version 1.0
  • 2019 1/17 Added FAQs
  • 2019 6/12 Added Screen-shots to FAQs
  • 2019 7/1 Chimera Plugin made available at Github




Tech Specs


CPU: >=4 cores
Memory: >=20Gb
GPU: not required.

Installations

Extract files from the MAINMAST.tgz

tar zxvf MAINMAST.tgz

Compile from source codes

cd MAINMAST
gfortran MAINMAST.f -O3 -fbounds-check -o MAINMAST -mcmodel=medium
gfortran ThreadCA.f -O3 -fbounds-check -o ThreadCA -mcmodel=medium




Model Refinement Script

Refinement_command.py is an integrated model refinement script that automatically setup files and parameters for model refinement.
Before using this script, please install following packages:
  • Rosetta Instruction
  • MDFF Tutorial
  • NAMD Tutorial
  • VMD Tutorial


  • Available Methods:
  • Rosetta relax
  • MDFF
  • python3 Refinement_command.py [MAP file] [Model file]
    options:
    --Method=relax : execute Rosetta relax protocol
    --Method=MDFF : execute MDFF refinement protocol

    Example: Rosetta Relax protocol

    python3 Refinement_command.py --OutPath ./rosetta_data --Method=relax MAP.mrc Init.pdb

    Example: MDFF

    python3 Refinement_command.py --OutPath ./mdff_data --Method=MDFF MAP.mrc Init.pdb




    MAINMAST program on cluster machines

    To perform MAINAMST calculations with various parameters in parallel on computer clusters, you can use slurm job files.
    To generate job files automatically, Plese use the script job_file_generator.py as follows:

    python3 job_file_generator.py [MAINMAST Program Path] [EM MAP file]

    This command generates slurm submission files with different parameters (MAINMAST paper).
    You can submit all job files by sbatch command as follows;

    for i in *.sub do sbatch -n12 -A name $i;done




    FAQ

    For MainMast FAQ click here


    License

    © 2017 Genki Terashi, Daisuke Kihara and Purdue University

    MAINMAST is a free software for academic and non-commercial users.
    It is released under the terms of the GNU General Public License Ver.3 (https://www.gnu.org/licenses/gpl-3.0.en.html).
    Commercial users please contact dkihara@purdue.edu for alternate licensing.



    Reference