Emap2sec identifies the secondary structures of proteins in cryo-EM maps of intermediate resolution range (~5 to 10 Å) .
Emap2sec uses convolutional deep neural network as its core algorithm and assigns a secondary structure to each of the grid points in an EM map
data_generate/map2train_fix [sample_mrc] [options] > [output_trimmap_filename]
INPUTS:
map2train_fix expects sample_mrc to be a valid filename. Supported file formats are Situs, CCP4, and MRC2000. Input may be gzipped. Format is deduced from FILE's extension.
OPTIONS:(Options marked with asterisk (*) are to be used only for benchmark purposes i.e., when you've the underlying crystal structure available)
-c, --contour The level of isosurface to generate density values for. You can use a value of 0 for simulated maps and the author recommended contour level for experimental EM maps. default=0.0
-g, --gzip Set this option to force reading input as gzipped. You can use gzip to compress a very large EM map and input the compressed file by setting this option.
-P* PDBFILE Input a PDB file to use C-Alpha (CA) atom position.
-r* [float] This option assigns true secondary structures labels to the generated voxels with the closest CA atom that's within a sphere of radius r. These true labels can be compared to the secondary structures assigned by Emap2sec for benchmarking. default=3.0
-sstep [integer] This option sets the stride size of the sliding cube used for input data generation. We recommend using a value of 2 that slides the cube by 2Å in each direction. Decreasing this value to 1 produces 8 times more data (increase by a factor of 2 in each direction) and thus slows the running time down by 8 times so please be mindful lowering this value. default=2
-vw [integer] This option sets the dimensions of sliding cube used for input data generation. The size of the cube is calculated as 2*vw+1. We recommend using a value of 5 for this option that generates input cube of size 11*11*11. Please be mindful while increasing this option as it increases the portion of an EM map a single cube covers. Increasing this value also increases running time. default=5 (->11x11x11)
-gnorm Set this option to normalize density values of the sliding cube, used for input data generation, by global maximum density value. Set this option as -gnorm. default=true
-lnorm Set this option to normalize density values of the sliding cube, used for input data generation, by local maximum density value. Set this option as -lnorm. We recommend using -gnorm option. default=false
-h, --help, -?, /? Displays the list of above options.
USAGE: ./map2train_fix protein.situs -c 2.75 > protein_trimmap
./map2train_fix protein.map -sstep 2 -r 3.0 -c 0.0 > protein_trimmap
./stride -f[output_stride_filename] [sample_pdb]
INPUT: Specify the name of your pdb file in place of [sample_pdb].
OUTPUT: Specify a name for output STRIDE file after -f option without space.
USAGE: ./stride -fprotein.stride protein.pdb
python data_generate/dataset.py [sample_trimmap] {sample_stride} [input_dataset_file] [ID]
INPUTS: Inputs to this script are trimmap, an optional STRIDE file, and ID is a unique identifier of a map such as SCOPe ID, EMID, etc.
OUTPUT: Specify a name for input dataset file in place of [input_dataset_file].
USAGE: python data_generate/dataset.py protein_trimmap protein_dataset protein_id python data_generate/dataset.py protein_trimmap protein.stride protein_dataset protein_id
python emap2sec/Emap2sec_[sim/exp].py [dataset_location_file]
INPUT: This program takes input as a file that contains location of input dataset. It also allows you to test multiple files at a time. File locations are to be "\n" delimited.
OUTPUT: This program writes two output files, one for each phase, which contain output predictions along with the probability value for each prediction. Sample output files are provided in the github link in Downloads tab and are named as outputP1_0 for Phase1 and outputP2_0 for Phase2.
USAGE: First run : echo [location of protein_dataset file] > dataset_location_file to save the location of your protein dataset file in dataset_location_file. You can then run emap2sec/Emap2sec_[sim/exp].py as shown below.
python emap2sec/Emap2sec.py dataset_location_file
visual/Visual2.pl [sample_trimmap] [output_file] [OPTIONS] > out_fin.pdb
INPUT: This program takes as inputs, the trimmap file generated in step 1 of input file generation and output file of Emap2sec SS identification. You can visualize Phase1 or Phase2 output by using the appropriate output file.
OPTIONS: -p : Show predicted data (Predicted secondary structures) -n : Show native data (True secondary structures)[OPTIONAL - Use in case you've the crystal structure information available] OUTPUT: This program outputs a pdb file that contains secondary structure assignments. A sample output file is provided in the github link in Downloads tab. USAGE: visual/Visual2.pl protein_trimmap outputP1_0 -p > out_fin1.pdb visual/Visual2.pl protein_trimmap outputP2_0 -p > out_fin2.pdb Upon pymol installation, from pymol download directory you can run the below code from command line, pymol out_fin2.pdb or Open Pymol GUI and load visual.pdb. Then run run pymol_script.py from the pymol command line. This gives you the final clean secondary structure visualization.
e2pdb2mrc.py d1kafa_.pdb d1kafa.mrc --res 10.0
data_generate/map2train_fix d1kafa.mrc -r 3 -c 0
-sstep 2 > d1kafa_trimmap
python data_generate/dataset.py d1kafa_trimmap
d1kafa_dataset d1kafa
echo ./d1kafa_dataset > test_dataset_location
python run_scripts/Emap2sec_sim.py test_dataset_location
visual/Visual2.pl d1kafa_trimmap outputP1_0 -p > out_fin.pdb
pymol out_fin.pdb
visual/Visual2.pl d1kafa_trimmap outputP2_0 -p > out_fin_L2.pdb
pymol out_fin_L2.pdb
./stride -fd1kafa.stride d1kafa_.pdb
You can then follow the same steps as earlier to generate out_fin_L2.pdb file and visualize it along with the crystal structure as follows,python data_generate/dataset.py d1kafa_trimmap d1kafa.stride
d1kafa_dataset d1kafa
This visualization is shown in the figure captioned Crystal structure agreement.pymol out_fin_L2.pdb d1kafa_.pdb
data_generate/map2train_fix 8796.mrc -r 3 -c 0.033 -sstep 2 \
> 8796_trimmap
python data_generate/dataset.py 8796_trimmap 8796_dataset 8796
echo ./8796_dataset > test_dataset_location
python run_scripts/Emap2sec_exp.py test_dataset_location
visual/Visual2.pl 8796_trimmap outputP1_0 -p > out_fin.pdb
pymol out_fin.pdb
visual/Visual2.pl 8796_trimmap outputP2_0 -p > out_fin_L2.pdb
pymol out_fin_L2.pdb
./stride -f5wcb.stride 5wcb.pdb
You can then follow the same steps as earlier to generate out_fin_L2.pdb file and visualize it along with the crystal structure as follows,python data_generate/dataset.py 8796_trimmap 5wcb.stride
8796_dataset 8796
pymol out_fin_L2.pdb 5wcb.pdb
This visualization is shown in the figure captioned Crystal structure agreement.
1. First, you need to create an account on code ocean using academic credentials and login into your account.
2. Then, click on the links above and go to the desired code ocean capsule.
3. To make a reproducible run i.e. run the code on an example input provided by us, click on the "Reproducible Run" button in the top right corner. This will start running the code on our example, and the results will be generated in the results folder at the bottom right after the execution is complete.
4. To run the code on an input of your choice, first go to our capsule and click on the "Edit Capsule" button in the top right corner. This will make a copy of our capsule which you can edit. Follow the instructions about how to upload and run your input in this copied capsule by reading the readme file present in the respective code ocean capsules.
Other details specific to the respective capsules can be found in the readme files of the capsules. For more details on how to run code ocean capsule please visit: Code Ocean user documentation
Emap2sec is a free software for academic and non-commercial users.
It is released under the terms of the GNU General Public License Ver.3 (https://www.gnu.org/licenses/gpl-3.0.en.html).
Commercial users please contact dkihara@purdue.edu for alternate licensing.
Citation of the following reference should be included in any publication that uses data or results generated by Emap2sec program.
© 2024 KIHARA Bioinformatics LABORATORY, PURDUE University | Design by TEMPLATED.