The large number of uncharacterized protein structures
highlights the need for the development of computational methods for
annotating proteins using the tertiary structure. These
also include function annotation methods by means of
characterizing protein local surfaces. In order to
facilitate structure-based protein annotation, 3D-SURFER offers
a web-based platform for rapid protein surface analysis and
comparison. The server integrates various methods to
assist in the high throughput screening and visualization
of protein surface comparisons. These methods are
discussed in detail below.
Integrative Web Interface:
3D-SURFER integrates global surface shape similarity-based search using 3D Zernike Descriptors, and local structure analysis using VisGrid and LIGSITEcsc. The
results obtained using these methods can be seamlessly
visualized in a single intuitive user interface.
3D Zernike Descriptors :
3D Zernike Descriptors (3DZD) are utilized for the
efficient comparison of protein surfaces. The descriptor
is a combination of coefficients calculated from a well
defined set of orthogonal 3D basis polynomials that
approximate a given 3D function (a grid of a discretized
surface). 3DZD has various desirable properties when
applied to protein surfaces:
- Rotational invariance: Prior
structural alignment is not required for protein
- Compactness: The protein surface can
be compactly represented as a feature vector with only
121 numbers (called invariants). Comparisons of these
vectors can be performed by calculating Euclidean distance in very quick time, thus allowing for rapid
- Hierarchical Resolution: Invariants
of lower resolution are also part of the higher
resolution. For example, the first 12 numbers among the
121 invariants represent the same protein at a lower
3DZD extraction procedure:
- Voxelization: The protein surface
triangulation/mesh is extracted using the MSROLL program
in Molecular Surface Package version 3.9.3 [Connolly M,
1983]. The mesh is then discretized to form a cubic grid.
- 3D Zernike transformation: The 3DZD
program [Novotni M. and Klein R, 2003] takes the cubic
grid as input and generates 3DZDs (the 121 invariants).
- The combinatorial extension (CE) method is used to compare and align protein structures.
It breaks each structure in the query set into a series of fragments that it then attempts
to reassemble into a complete alignment. A series of pairwise combinations of fragments
called aligned fragment pairs, or AFPs, are used to define a similarity matrix through which
an optimal path is generated to identify the final alignment. Only AFPs that meet given criteria
for local similarity are included in the matrix as a means of reducing the necessary search
space and thereby increasing efficiency. Pairwise aligment can be performed via the
web at http://cl.sdsc.edu/ce.html.
- The VisGrid algorithm facilitates the characterization
of local geometric features of protein surfaces in an
interactive manner using various features provided by the
visibility criterion. The visibility is defined as the
fraction of visible directions from a target position on
a protein surface. A pocket or a hollow is recognized as
a cluster of positions with a small visibility. A large
protrusion in a protein structure is recognized as a
pocket in the negative image of the structure. While
existing methods restrict themselves to locating pockets
with potential ligand binding site behavior, VisGrid can
also focus on the dominant geometric features in the
protein structure by identifying large protrusions,
hollow and flat regions on the surface.
- The above figure illustrates various examples of
protein surface cavities (Blue), protrusions (red), and
flat regions (green) identified by the visibility
criterion using VisGrid.
- LIGSITEcsc is an algorithm for the automatic identification of pockets on protein surface using the Connolly surface and the degree of conservation.
- The procedure to identify pockets is as follows: First, the protein is projected onto a 3D grid step size of 1.0 Angstrom. Second, grid points are labelled as protein, surface, or solvent. A sequence of grid points, which starts and ends with surface grid points and which has solvent grid points in between, is called a surface-solvent-surface event. LIGSITEcsc scans the x, y, z directions and four cubic diagonals for such surface-solvent-surface events. If a solvent grid point is part of at least five surface-solvent-surface events, it is marked as pocket. Finally, all pocket grid points are clustered according to their spatial proximity, i.e., if a pocket grid point is within 3.0 Angstrom to a pocket grid point cluster, it is added to this cluster. Otherwise, it becomes a new cluster. Next, the clusters are ranked by the number of grid points in the cluster. The top three clusters are retained and re-ranked according to the degree of conservation of the involved surface residues.
Query entry IDs:
The input data is a protein structure, which will be compared against a user-specified dataset
of the entire PDB database. The input structure is provided by entering its identification (ID) code or
by uploading a PDB format file to the server.
The ID code of an input protein structure is named based on
the PDB ID of the protein. If an entire structure (e.g. a protein complex) in a PDB file is chosen for input,
the ID is the same as the PDB code (e.g. 7tim). However, some subunits are clustered as separated as different
entries (e.g. 12e8-C01 with Chain H, L and 12e8-C02 with Chain M, P), if the clusters are far apart from each other
with a minimum distance at 4.5 Angstrom. A chain in a PDB file can be specified by adding a hyphen and
the chain ID following the PDB code, e.g. 7tim-A. A domain of a chain can be specified by further
adding a domain ID that is defined by CATH, (e.g. 7tim-A-01). The composition of the complex (i.e. chains in 12e8-C02)
and domain (i.e. residues range) entries are shown when users move the mouse on the ids below the pictures
in the resulting page.
Please click to download Chain composition of complexes
and Residue id range of domains
Basic features available through 3D-SURFER:
- Viewing surface comparison results
- Comparisons are performed by calculating the
Euclidean distance (the square root of the sum of the
squares of the differences between corresponding
values) between the Zernike feature vectors (121 scalar
values) representing the proteins. In the 3D-Surfer
results, this is shown after label, "EucD:" .
- Viewing surface analysis results
- The Jmol applet can be used to rotate the query
structure and color the surface by cavity, protrusion,
and flatness. Clicking on the buttons called
"Cavity", "Protrusion", or
"Flat" will render the surface in three
different colors based on their rank in terms of
geometric visibility: Red (1st), Green (2nd), and Blue
(3rd). Also shown are the volumes (in cubic Angstrom) and
surface areas (in square Angstrom) of the convex hull formed
from the atom coordinates of the residues identified by VisGrid.
- Rotatable protein surface figures
- Protein surfaces can be rotated by moving the mouse
over each of the images of the results. The images
will spin 360° along both the X and Y axes to give a
complete view of the protein surface.
- Structure alignment calculations and
- Structure-based alignments of the proteins can be
obtained by using the Combinatorial Extension (CE)
program. To execute CE, check "RMSD:" box of
proteins in comparison to your query structure.
If the calculation was possible, the RMSD value and the coverage (the number of aligned amino acids divided by the length of the query entry) will be
displayed,and a new button will appear; if this button
is clicked, the visualization of the CE alignment will
appear on the left panel.
- Viewing CATH codes
- CATH codes for each of the results can also be
viewed next to the "CATH: " section.
- CATH code filtering
- It is common that the results returned are very
similar, in terms of the CATH codes they have. If the
user wants, for example, to get results that are
different in terms of the first two levels (specify
CATH filter as "CA"), then the query will
avoid returning repeated results for structures that
share the first two levels. In other words, if two
structures have CATH codes 3.40.390.10 and 188.8.131.520,
only one of them would be returned, because 3.40 is
- Length filtering
- When "Residue Length Filter" is enabled,
the results returned will be similar in terms of the
number of residues that each structure has. Two
structures are considered similar if the size of one
with respect to the other is between 0.57 and 1.75
times the size of the other one.
- PDB Link
- Each reported result displays the corresponding PDB
ID and is directly linked to the PDB website.
- Zernike Invariants
- The 121 Zernike Invariants (or Zernike Descriptors),
that characterize each structure are displayed in text
and graphic forms, below the molecule visualization
Download 3DZD files of PDB:
The version includes the 3DZD of chains stored in 3D-Surfer V1.0 by April 2009
Please click to download
- Lee Sael, Bin Li, David La, Yi Fang, Karthik Ramani,
Raif Rustamov and Daisuke Kihara. Fast protein tertiary
structure retrieval based on global surface shape
similarity. Proteins: Structure, Function, and
- Bin Li, Srinivasan Turuvekere, Manish Agrawal, David
La, Karthik Ramani and Daisuke Kihara. Characterization
of local geometry of protein surfaces with the Visibility
Criterion. Proteins: Structure, Function, and
- Connolly ML. Solvent-accessible surfaces of proteins
and nucleic acids. Science 1983;221(4612):709-713.
- Novotni M, Klein R. 3D Zernike descriptors for content
based shape retrieval.ACM Symposium on Solid and
Physical Modeling, Proceedings of the eighth ACM
symposium on Solid modeling and Applications 2003;216-225.
- Huang B, Schroeder M. LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation.BMC Struct Biol2006;6:19.