Home Server LZerD Multi-LZerD PI-LZerD IDP-LZerD Path-LZerD Flex-LZerD About

Multi-LZerD: Multiple protein docking for asymmetric complexes

Updates

Introduction

Until now most of the protein docking prediction methods have focused on pairwise docking. There exist a handful of methods for the prediction of multimeric complexes, but almost all of them assume specific properties such as homomericity or symmetry. Considering a substantial number of multimeric complexes of diverse kinds exist in a cell, there is an urgent need for the development of a multiple protein docking prediction method that does not target a specific type of complexes. We have developed a novel computational multiple protein docking algorithm, Multi-LZerD, that builds models of multimeric complexes by effectively reusing pairwise docking predictions of component proteins.

Overview of Multi-LZerD

Multi-LZerD can be applied when users need to generate models of more than two proteins. In the first phase, pairwise docking predictions are generated for every possible pair of protein units, using LZerD. Then, a set of randomly generated graphs are created to represent different ways in which the units can bind to each other, in order to create the larger multimeric complex. For each pairwise connection created, a docking pose between the two proteins involved is selected randomly from LZerD predictions from the first phase. The graphs are then subject to an iterative improvement via a Genetic Algorithm (GA) that uses a physics-based fitness function. Finally, after a configurable number of iterations, a final refinement step is applied to the structures. Multi-LZerD was able to handle different graph topologies, as well as outperform the only method that could be directly compared to it, particularly in unbound docking cases.

Download Multi-LZerD

The current version of Multi-LZerD is available as a tar.gz file and can be downloaded here. Execution instructions are given in the README file provided. The dataset used for evaluation can be downloaded here. Both bound and unbound cases are included and a detailed description is provided in the README file.

Physics based scoring function

The physics-based scoring function is provided as a standalone program here. The README file explains how to generate the score for a PDB file that contains two or more interacting units.

Executing Multi-LZerD with multiple threads

Executing on a single thread, Multi-LZerD may take a prohibitively long time to execute. The Multi-LZerD executable will automatically try to take advantage of multithreading if your system supports it. To manually control the number of threads Multi-LZerD attempts to use, set the OMP_NUM_THREADS environment variable. For example, if you are using bash as your shell and want to use 24 threads, you can run export OMP_NUM_THREADS=24 before running Multi-LZerD.


Tutorial

General Information

Download and save the tar files of the program in your preferred directory. The files can be unpacked using the commands:

  • gunzip multilzerddistribution.tar.gz
  • tar -xvf multilzerddistribution.tar
This generates the corresponding Directory containing the files:
• A-sample.pdb • charmm_atom_types.in • multilzerd_cluster
• B-sample.pdb • charmm_params.in • multilzerd_create_pdb
• C-sample.pdb • create_lzerd_decoys.pl • multilzerd_pairwise_cluster.pl
• GETPOINTS • hbplus • multilzerd_refine
• LZD32 • lzerd_cluster • multilzerd_sort_decoys.pl
• LZerD • mark_sur • run.sh
• README • modify_hbplus • uniCHARMM
• addhydrogens.pl • multilzerd

The program hbplus is compiled on x86-32bit. If you use a Windows Subsystem for Linux, the source code may be available at https://github.com/mmravic314/bin/blob/master/HBplus .

The program has as default “basename: ‘sample’ ” and “units: ‘A, B, C’ ”. Make sure to modify “run.sh” and change these according to the files you are using. For example, for the protein 1A0R we would have:

  • basename='1A0R’
  • units='B,G,P’
For less running time, the parameters for Multi-LZerd can be changed from 200 to 50, 100...:

  • ./multilzerd --pdbid $basename --chains $units -o $basename –generations 50 --population 50 --clashes 300 --cluster $cluster_threshold --weights all
And respectively for the “Create PDBs” command:

  • ./multilzerd_create_pdb $basename.ga.out ./ 1 50 decoy

It is recommended to add the command nohup at the beginning to avoid “timeout errors” and redirect terminal output to a log file; i.e.:

nohup ./run.sh >& log.txt &

Program Output

When the program starts running, it will first read in the PDBs and write “.pdb.h” files, Ex:

Adding hydrogens...
B-1A0R.pdb
Writing data to B-1A0R.pdb.h

Next, it creates pairwise predictions with LZerD, first generating several intermediate files required for each protein unit, and then performing the actual docking for each pair of units.

Generating intermediate files for B-1A0R
Executing mark_sur...
Calculating surfaces with GETPOINTS...
Original: 2009 Reduced: 1645
Calculating Zernike descriptors with LZD32...
.
.
.
REC: 1645 LIG: 647
Calculating 3DZD correlations ...
Creating basis for ligand ...
Creating basis for receptor ...
SASA calculation ...
Voting ...
VDOCK done in a total of 18963 seconds
.
.
.

Then, it Filters the pairwise predictions:

Filtering pairwise predictions
Sun Jan 27 19:30:29 EST 2013
Sorting B-G.out...
Sorting B-P.out...
Sorting G-P.out...
Finished sorting
Clustering pairwise predictions

Finally, it runs Multi-LZerD, creates the final decoy PDBs and refines them:

Multi-LZerD...
Mon Jan 28 11:43:26 EST 2013
Creating PDB files for final decoys
Mon Jan 28 12:07:07 EST 2013
Refinement...
Mon Jan 28 12:07:10 EST 2013

For each refinement it outputs the series of energy improvements made by the Monte Carlo algorithm and then the final “Energy of the complex”. The energy is a linear combination of terms, so it does not have a particular unit of measure. Ex:

Analyzing prediction #1

Move accepted: 1503.99 -> 1389.93
Move accepted: 1389.93 -> 1097.28
Move accepted: 1097.28 -> 844.727
Move accepted: 844.727 -> 780.315
Move accepted: 780.315 -> 510.9
...
Energy of Complex: 373.506

   
Copyright © 2024 KIHARA Bioinformatics LABORATORY, PURDUE University