Changes
From Opengenome.net
no edit summary
== [[단백질 구조 모델링]] 실용 안내서==<br /><br />[http://biome.ngic.re.kr/ProteinModelling/] <br /><br />===Problem (문제정의)===<br />* Build 3D structural models of given or interested proteins <br />** Initial input: amino acid sequence of the target protein <br />** Final output: its predicted 3D structure <br /><br />===Method===<br /> * Find one or more structural templates of the target protein via sequence homology search against known structures (program: BLASTP or PSI-BLAST, database: PDB) <br />* Download template structures in PDB format (web server: RCSB PDB) <br />* Align the amino acid sequence of the target protein with that(those) of the template protein(s) (program: CLUSTALW, output: PIR format) <br />* Build 3D structural models based on the multiple sequence alignment and the template 3D structure(s) (program: MODELLER, input: ATM, ALI, and TOP files) <br />** manual build <br />*** manually convert file formats from PDB to ATM and PIR to ALI, respectively <br />*** write a TOP script file <br />*** run MODELLER <br />** automatic build (some PDB and PIR files are not successfully converted to correct ATM and ALI files due to amino-acid sequence mismatch) <br />*** perl script to generate ATM, ALI, and TOP files from PDB and PIR files: [[perl script for ATM|txt]], gz <br />*** command-line arguments in order: base_directory_path pir_file_name target_protein_id(exactly 4 letters) one_or_more_PDB_file_names(exactly 4-letter prefix) <br />*** run the perl script (e.g. ../bin/makeModellerInput.pl /home/user/Protein3DModelling/KCIP KCIP.pir KCIP 1QJA.pdb) <br />*** run MODELLER (e.g. mod7v7 KCIP.top) <br /><br />** example <br />*** ATM file: 5fd1.atm <br />*** ALI file: alignment.ali <br />*** TOP file: model-default.top <br />*** command: mod7v7 model-default.top <br />*** output PDB file (predicted model): 1fdx.B99990001.pdb <br />** MODELLER manual <br />*** PDF, HTML <br /><br /> * Display and compare the 3D structures of the model and the template proteins (program: RasMol) <br /><br />=== Sample proteins (단백질 예제들)===<br />* Sample protein A: 14-3-3 protein gamma (Protein kinase C inhibitor protein-1; KCIP-1) > >sw|P61981|143G_HUMAN 14-3-3 protein gamma (Protein kinase C inhibitor protein-1) (KCIP-1) VDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQDVLSLLDNYLIKNCSETQYESKVFYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHEISKEHMQPTHPIRLGLALNYSVFYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDDDGGEGNN * Sample protein B: GAK_HUMAN Cyclin G-associated kinase (GAKH) > >sw|O14976|GAK_HUMAN Cyclin G-associated kinase MSLLQSALDFLAGPGSLGGASGRDQSDFVGQTVELGELRLRVRRVLAEGGFAFVYEAQDVGSGREYALKRLLSNEEEKNRAIIQEVCFMKKLSGHPNIVQFCSAASIGKEESDTGQAEFLLLTELCKGQLVEFLKKMESRGPLSCDTVLKIFYQTCRAVQHMHRQKPPIIHRDLKVENLLLSNQGTIKLCDFGSATTISHYPDYSWSAQRRALVEEEITRNTTPMYRTPEIIDLYSNFPIGEKQDIWALGCILYLLCFRQHPFEDGAKLRIVNGKYSIPPHDTQYTVFHSLIRAMLQVNPEERLSIAEVVHQLQEIAAARNVNPKSPITELLEQNGGYGSATLSRGPPPPVGPAGSGYSGGLALAEYDQPYGGFLDILRGGTERLFTNLKDTSSKVIQSVANYAKGDLDISYITSRIAVMSFPAEGVESALKNNIEDVRLFLDSKHPGHYAVYNLSPRTYRPSRFHNRVSECGWAARRAPHLHTLYNICRNMHAWLRQDHKNVCVVHCMDGRAASAVAVCSFLCFCRLFSTAEAAVYMFSMKRCPPGIWPSHKRYIEYMCDMVAEEPITPHSKPILVRAVVMTPVPLFSKQRSGCRPFCEVYVGDERVASTSQEYDKMRDFKIEDGKAVIPLGVTVQGDVLIVIYHARSTLGGRLQAKMASMKMFQIQFHTGFVPRNATTVKFAKYDLDACDIQEKYPDLFQVNLEVEVEPRDRPSREAPPWENSSMRGLNPKILFSSREEQQDILSKFGKPELPRQPGSTAQYDAGAGSPEAEPTDSDSPPSSSADASRFLHTLDWQEEKEAETGAENASSKESESALMEDRDESEVSDEGGSPISSEGQEPRADPEPPGLAAGLVQQDLVFEVETPAVLPEPVPQEDGVDLLGLHSEVGAGPAVPPQACKAPSSNTDLLSCLLGPPEAASQGPPEDLLSEDPLLLASPAPPLSVQSTPRGGPPAAADPFGPLLPSSGNNSQPCSNPDLFGEFLNSDSVTVPPSFPSAHSAPPPSCSADFLHLGDLPGEPSKMTASSSNPDLLGGWAAWTETAASAVAPTPATEGPLFSPGGQPAPCGSQASWTKSQNPDPFADLGDLSSGLQGSPAGFPPGGFIPKTATTPKGSSSWQTSRPPAQGASWPPQAKPPPKACTQPRPNYASNFSVIGAREERGVRAPSFAQKPKVSENDFEDLLSNQGFSSRSDKKGPKTIAEMRKQDLAKDTDPLKLKLLDWIEGKERNIRALLSTLHTVLWDGESRWTPVGMADLVAPEQVKKHYRRAVLAVHPDKAAGQPYEQHAKMIFMELNDAWSEFENQGSRPLF <br /><br />=== Available Online Tool Servers ===<br /># BLASTP: ## [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=250&ALIGNMENT_VIEW=Pairwise&CDD_SEARCH=on&CLIENT=web&DATABASE=nr&DESCRIPTIONS=500&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&I_THRESH=0.005&MATRIX_NAME=BLOSUM62&NCBI_GI=on&PAGE=Proteins&PROGRAM=blastp&SERVICE=plain&SET_DEFAULTS.x=41&SET_DEFAULTS.y=5&SHOW_OVERVIEW=on&END_OF_HTTPGET=Yes&SHOW_LINKOUT=yes&GET_SEQUENCE=yes 'NCBI BLASTP']## [http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+Launch+-id+1uBRI1Ot9iE+-appl+BlastP+-launchFrom+top EBI SRS BLASTP] <br /><br /># PSI-BLAST: <br />## [http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=250&ALIGNMENT_VIEW=Pairwise&CLIENT=web&COMPOSITION_BASED_STATISTICS=on&DATABASE=nr&CDD_SEARCH=on&DESCRIPTIONS=500&ENTREZ_QUERY=%28none%29&EXPECT= 'NCBI PSI-BLAST'] <br /># CLUSTALW: <br />## [http://www.ebi.ac.uk/clustalw/ 'EBI ClustalW'], <br />## [http://www.genebee.msu.su/clustal/basic.html 'Genebee ClustalW'] # SRS servers: 'NGIC SRS', 'Public SRS servers' # Search engine: Google (Search online servers by yourself!) <br /><br />===Available Online Databases ===<br /># PDB amino-acid FASTA file: NCBI BLAST DB (local: pdbaa.gz) # PDB: ## [http://www.rcsb.org/pdb/ RCSB], ## [http://pdb.ccdc.cam.ac.uk/pdb/ 'UK PDB mirror'], <br />## [http://pdb.protein.osaka-u.ac.jp/pdb/ 'Japan PDB mirror'] ===Downloadable programs===# BLAST: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/ (local: DOS) # CLUSTALW: ftp://ftp.ebi.ac.uk/pub/software/dos/clustalw/ (local: DOS[WindowsXP]) # MODELLER: http://salilab.org/modeller/ (local: DOS) # RasMol: http://www.openrasmol.org/ (local: Windows) # Perl: http://www.perl.org/ (local: DOS[WindowsXP]) # Other utilities: ALZip.exe (for unzip, untar, and ungzip) <br /><br />===Solution Example === <br /># Found templates (PDB format)## Sample protein A: [http://biome.ngic.re.kr/ProteinModelling/templates/1QJA.pdb 1QJA.pdb] ## Sample protein B: [http://biome.ngic.re.kr/ProteinModelling/templates/1N4C.pdb 1N4C.pdb] -------------------------------------------------------------------------------- <br /># Multiple sequence alignments (PIR format)<br />## Sample protein A: [http://biome.ngic.re.kr/ProteinModelling/buildingModels/KCIP/modellerResult/KCIP.pir KCIP.pir] <br />## Sample protein B: [http://biome.ngic.re.kr/ProteinModelling/buildingModels/GAKH/modellerResult/GAKH.pir.ORG GAKH.pir.ORG] -------------------------------------------------------------------------------- * MODELLER input files** Sample protein A: *** [http://biome.ngic.re.kr/ProteinModelling/buildingModels/KCIP/modellerResult/1QJA.atm ATM], *** [http://biome.ngic.re.kr/ProteinModelling/buildingModels/KCIP/modellerResult/KCIP.ali ALI], *** [http://biome.ngic.re.kr/ProteinModelling/buildingModels/KCIP/modellerResult/KCIP.top TOP] files (automatically generated) ** Sample protein B:*** [http://biome.ngic.re.kr/ProteinModelling/buildingModels/GAKH/modellerResult/1N4C.atm ATM], *** [http://biome.ngic.re.kr/ProteinModelling/buildingModels/KCIP/modellerResult/KCIP.ali ALI], *** [http://biome.ngic.re.kr/ProteinModelling/buildingModels/GAKH/modellerResult/GAKH.top TOP] files (manually edited; C-terminal only) * MODELLER output files** Sample protein A: [http://biome.ngic.re.kr/ProteinModelling/buildingModels/KCIP/modellerResult/KCIP.pdb KCIP.pdb] ** Sample protein B: [http://biome.ngic.re.kr/ProteinModelling/buildingModels/GAKH/modellerResult/GAKH.pdb GAKH.pdb]