Home Search Benchmark Tutorial Reference Contact
 

Introduction

The steady increase in the number of electron microscopy maps deposited in the EMDataBank requires the development of search tools that can compare the isosurface of maps to help understand the relationships between them. EM-SURFER provides a web-based infrastructure to rapidly compare 3D EM maps based on isosurfaces derived from author-recommended contour levels. The features provided to users are discussed in detail below.

Integrative Web Interface:

EM-Surfer provides an easy-to-use search interface where a target EM map's isosurface shape is compared against the EMDataBank, using 3D Zernike Descriptors as an efficient shape descriptor. The query provided by the user can be compared against several surface representations. It can take into account volume differences between the target and maps in the database. Results can be easily visualized in a user-friendly search results page. By clicking one of the maps retrieved, a subsequent search is invoked using the map clicked as a query to further navigate 3D EM maps.

3D Zernike Descriptors

3D Zernike Descriptors (3DZD) are utilized for the efficient comparison of EM map isosurfaces. The descriptor is a combination of coefficients calculated from a set of orthogonal 3D basis polynomials that approximate a given 3D function (e.g. a grid representing the EM map). 3DZD has various desirable properties when applied to EM maps:

  • Rotational invariance: Prior structural alignment is not required for map comparisons.
  • Compactness: An individual surface can be compactly represented as a feature vector with only 121 numbers (called invariants).  Comparisons of these vectors can be performed by calculating the Euclidean distance in a speedy fashion, thus enabling rapid shape retrieval. Furthermore, the concatenation of vectors that describe the same map at different contour levels create richer descriptors (up to 242 and 363 numbers in EM-SURFER).
  • Hierarchical Resolution: Invariants of lower resolution are also part of the higher resolution. For example, the first 12 numbers among the 121 invariants represent a coarse-grained version of the surface, while the complete 121 invariants provide the best approximation of the surface that the polynomials can give us.

For more technical details about the use of the 3DZD for EM map comparison, please refer to our previous work [Sael L. and Kihara D. , 2010] .

3DZD Computation Procedure

  1. Isosurface generation: Three isosurfaces at different density levels are created for each EM map in the database. One comes from the author-recommended contour, as specified in the EMDataBank. Additionally, two higher isovalues are used to represent regions that are closer to the core of the molecules. In 1/3 Core Contour, an increased contour level, 1/3 * (max density - recommended contour) is used. The same idea applies for 2/3 Core Contour, just changing the factor from 1/3 to 2/3.
  2. 3D Zernike transformation: The 3DZD program [Novotni M. and Klein R, 2003] takes the cubic grid as input and generates 3DZDs for each surface representation (121 invariants).
  3. Zernike invariant combinations: The previous step creates three vectors of size 121. The various shape representation alternatives offered by EM-SURFER arise from a combination of these. For example, EMDB contour+1/3+2/3 uses a descriptor vector of size 363, by concatenating all three vectors. On the other hand, EMDB contour uses only the 121 invariants from the author-recommended contour value.

Search Results by EM-SURFER

A search against the whole EMDB entries can be performed from the search page .

  • In Step 1, choose the contour shape representation. The default is set to the author-recommended contour level, but users can choose any of the other 3 options.
  • In Step 2, choose the EMDB entry ID or upload an EM map file. To find an ID by a text search using for example, protein name, use the EMDB search page.
    Upload troubleshooting: While EM-SURFER has several checks and repair machanisms to process uploaded files, it is possible that some errors arise. These situations occur because of network connectivity problems while uploading files, incorrect formatting in the EM file uploaded, etc. If you experience a problem uploading your map please contact dkihara@purdue.edu
  • In Step 3, a volume filter is provided. The default is on. When this filter is on, a search only retrieves EM maps that have similar volume to the query.

The figure below explains the search result page:

EM-SURFER Search Result Page



  • Query entry ID
    • It is a unique 4-digit accession number for each EMDB entry, which can also start along with "EMD-".
  • Name of the query entry
    • The description information about the entry is extracted from the XML file of the entry in the EMDB. If the length of the characters of the entry is larger than 40, the initial 40 characters and ellipsis (...) are shown, and the full information is presented on the popup box.
  • Figure of the query entry
    • A figure of an isosurface of the entry is shown. It is provided from the EMDB.
  • Zernike descriptors that characterize the query entry
    • The Zernike Invariants (or Zernike Descriptors), which characterize each EM isosurface are displayed in text and a graphic form.
    • By clicking on the figure of retrieved entries, a new search will be invoked against the whole EMDB from the clicked entry.
  • List of retrieved entries for the query
    • When clicking on the figure, it can smoothly invoke the iterative search of the similar EM isosurface on the target entry.
    • The retrieved EM Surfaces from the database are shown on the panel below. They are ranked by their Euclidean distance of the 3DZDs to that of the query entry. In default setting, the top 20 similar surfaces are displayed.
  • Dissimilarity of the shape of the retrieved entry to the query
    • The dissimilarity of two EM isosurfaces are quantified by the Euclidean distance (the square root of the sum of the squares of the differences between corresponding values) between the 3DZD vectors of the retrieved isosurface against the query.
    • The smaller the EucD value, the more similar the shape of the two EM map isosurface.
    • Empirically, related entries have an Euclidean distance of less than 10.0.
  • Ratio of the volume of the retrieved entry to the query
    • The ratio is defined as the volume of the retrieved EM entry in the database devided by the volume of the query entry.

Job Submission in Batch Mode

When users would like to benchmark EM-SURFER by submitting a large number of queries, they can use the batch mode page .

  • When submitting a batch of queries to EM-SURFER, users can either type a custom list of EMDB IDs or upload a separate file with those IDs.
  • Taking the same steps as the submission of a single entry, users can go through the page to select an isosurface representation, and specify the volume filter.
  • The extra step in this section is to select the number/size of the retrieved list for each query.
Taking EMD-1884 as an example, The figure below shows its retrieved result list of the top 20 most similar isosurface in EMDB.

EM-SURFER Batch Mode Search Result Page

References

  1. Lee Sael and Daisuke Kihara, Protein surface representation for application to comparing low-resolution protein structure data.ABMC Bioinformatics 2010;11:S2.
  2. Novotni M, Klein R. 3D Zernike descriptors for content based shape retrieval.ACM Symposium on Solid and Physical Modeling, Proceedings of the eighth ACM symposium on Solid modeling and Applications 2003;216-225.
 
   
Copyright © 2017 KIHARA Bioinformatics LABORATORY , PURDUE University