Supplemental Material For:

Benchmarking of Structure Refinement Methods for Protein Complex Models

Jacob Verburgt1 and Daisuke Kihara1,2,3,*

1 Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
2 Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
3 Purdue University Center for Cancer Research, Purdue University, West Lafayette, IN, 47907, USA
* Corresponding Author

Supplemental Datasets:

ZDOCK Derived Benchmark Dataset:

The primary benchmark set used in was directly derived from the ZDOCK Benchmark set. The ZDOCK set contains four structures per target: An unbound ligand, an unbound receptor, a bound ligand, and a bound receptor. The benchmark set is available in such a way where the coordinates of the bound subunits are oriented identical to their complex structure, and the unbound subunits are superimposed onto their respective bound subunits. Our dataset creates the optimially oriented "unbound" complexes by combining the superimposed and unbound subunits, along with removal of waters, ligands, and other non-protein atoms. These unbound complexes are saved in the dataset in the form "XXXX_c_u.pdb", where XXXX is the PDB ID. The final dataset can be downloaded here.

From the complete ZDOCK Benchmark of 230 targets, 18 targets were removed due to containing multiple ligand chains, which is incompatible with the standard "ligand to receptor" model used within CAPRI. The ZDOCK PDB ID's of these targets are 1AKJ", "1BJ1", "1DE4", "1EER", "1EXB", "1EZU", "1GP2", "1I9R", "1JMO", "1K74", "1N2C", "1QFW", "2HMI", "3EO1", "3HMX", "4FQI", "4GXU", and "9QFW".

There are an additional 8 targets where the superimpostion of the ligand and receptor structures onto the complex led to entanglement of the chains, and were subsequently removed from the dataset. The ZDOCK PDB ID's for these targets are "1BGX", "1H1V", "1IRA", "1R8S", "1Y64", "2OT3", "3AAD", "4GAM".

This dataset is also made openly available in Zenodo at

CAPRI Model Dataset:

The CAPRI dataset, which is derived from CAPRI rounds 38-45 is unable to be distributed directly due to CAPRI guidelines, but can be derived from "Scoring round" models from the CAPRI Website .

The targets considered were T122-T125, T131-T133, and T136, as these were targets which contained globular protein ligands and receptors. Please contact us directly if you have any furhter questions on this dataset.


