Protein modelling

From Opengenome.net
Revision as of 17:51, 12 May 2006 by 210.218.222.82 (talk)

== 단백질 구조 모델링 실용 안내서==

[1]

===Problem (문제정의)===
* Build 3D structural models of given or interested proteins
** Initial input: amino acid sequence of the target protein
** Final output: its predicted 3D structure

===Method===
 * Find one or more structural templates of the target protein via sequence homology search against known structures (program: BLASTP or PSI-BLAST, database: PDB)
* Download template structures in PDB format (web server: RCSB PDB)
* Align the amino acid sequence of the target protein with that(those) of the template protein(s) (program: CLUSTALW, output: PIR format)
* Build 3D structural models based on the multiple sequence alignment and the template 3D structure(s) (program: MODELLER, input: ATM, ALI, and TOP files)
** manual build
*** manually convert file formats from PDB to ATM and PIR to ALI, respectively
*** write a TOP script file
*** run MODELLER
** automatic build (some PDB and PIR files are not successfully converted to correct ATM and ALI files due to amino-acid sequence mismatch)
*** perl script to generate ATM, ALI, and TOP files from PDB and PIR files: txt, gz
*** command-line arguments in order: base_directory_path pir_file_name target_protein_id(exactly 4 letters) one_or_more_PDB_file_names(exactly 4-letter prefix)
*** run the perl script (e.g. ../bin/makeModellerInput.pl /home/user/Protein3DModelling/KCIP KCIP.pir KCIP 1QJA.pdb)
*** run MODELLER (e.g. mod7v7 KCIP.top)

** example
*** ATM file: 5fd1.atm
*** ALI file: alignment.ali
*** TOP file: model-default.top
*** command: mod7v7 model-default.top
*** output PDB file (predicted model): 1fdx.B99990001.pdb
** MODELLER manual
*** PDF, HTML

 * Display and compare the 3D structures of the model and the template proteins (program: RasMol)

=== Sample proteins (단백질 예제들)===
* Sample protein A: 14-3-3 protein gamma (Protein kinase C inhibitor protein-1; KCIP-1) >sw|P61981|143G_HUMAN 14-3-3 protein gamma (Protein kinase C inhibitor protein-1) (KCIP-1) VDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSNEERNLLSVAYKNVVGARRSSWR VISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQDVLSLLDNYLIKNCSETQYESKV FYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHEISKEHMQPTHPIRLGLALNYSV FYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDDD GGEGNN * Sample protein B: GAK_HUMAN Cyclin G-associated kinase (GAKH) >sw|O14976|GAK_HUMAN Cyclin G-associated kinase MSLLQSALDFLAGPGSLGGASGRDQSDFVGQTVELGELRLRVRRVLAEGGFAFVYEAQDV GSGREYALKRLLSNEEEKNRAIIQEVCFMKKLSGHPNIVQFCSAASIGKEESDTGQAEFL LLTELCKGQLVEFLKKMESRGPLSCDTVLKIFYQTCRAVQHMHRQKPPIIHRDLKVENLL LSNQGTIKLCDFGSATTISHYPDYSWSAQRRALVEEEITRNTTPMYRTPEIIDLYSNFPI GEKQDIWALGCILYLLCFRQHPFEDGAKLRIVNGKYSIPPHDTQYTVFHSLIRAMLQVNP EERLSIAEVVHQLQEIAAARNVNPKSPITELLEQNGGYGSATLSRGPPPPVGPAGSGYSG GLALAEYDQPYGGFLDILRGGTERLFTNLKDTSSKVIQSVANYAKGDLDISYITSRIAVM SFPAEGVESALKNNIEDVRLFLDSKHPGHYAVYNLSPRTYRPSRFHNRVSECGWAARRAP HLHTLYNICRNMHAWLRQDHKNVCVVHCMDGRAASAVAVCSFLCFCRLFSTAEAAVYMFS MKRCPPGIWPSHKRYIEYMCDMVAEEPITPHSKPILVRAVVMTPVPLFSKQRSGCRPFCE VYVGDERVASTSQEYDKMRDFKIEDGKAVIPLGVTVQGDVLIVIYHARSTLGGRLQAKMA SMKMFQIQFHTGFVPRNATTVKFAKYDLDACDIQEKYPDLFQVNLEVEVEPRDRPSREAP PWENSSMRGLNPKILFSSREEQQDILSKFGKPELPRQPGSTAQYDAGAGSPEAEPTDSDS PPSSSADASRFLHTLDWQEEKEAETGAENASSKESESALMEDRDESEVSDEGGSPISSEG QEPRADPEPPGLAAGLVQQDLVFEVETPAVLPEPVPQEDGVDLLGLHSEVGAGPAVPPQA CKAPSSNTDLLSCLLGPPEAASQGPPEDLLSEDPLLLASPAPPLSVQSTPRGGPPAAADP FGPLLPSSGNNSQPCSNPDLFGEFLNSDSVTVPPSFPSAHSAPPPSCSADFLHLGDLPGE PSKMTASSSNPDLLGGWAAWTETAASAVAPTPATEGPLFSPGGQPAPCGSQASWTKSQNP DPFADLGDLSSGLQGSPAGFPPGGFIPKTATTPKGSSSWQTSRPPAQGASWPPQAKPPPK ACTQPRPNYASNFSVIGAREERGVRAPSFAQKPKVSENDFEDLLSNQGFSSRSDKKGPKT IAEMRKQDLAKDTDPLKLKLLDWIEGKERNIRALLSTLHTVLWDGESRWTPVGMADLVAP EQVKKHYRRAVLAVHPDKAAGQPYEQHAKMIFMELNDAWSEFENQGSRPLF

=== Available Online Tool Servers ===
# BLASTP: ## 'NCBI BLASTP' ## EBI SRS BLASTP

# PSI-BLAST:
## 'NCBI PSI-BLAST'
# CLUSTALW:
## 'EBI ClustalW',
## 'Genebee ClustalW' # SRS servers: 'NGIC SRS', 'Public SRS servers' # Search engine: Google (Search online servers by yourself!)

===Available Online Databases ===
# PDB amino-acid FASTA file: NCBI BLAST DB (local: pdbaa.gz) # PDB: ## RCSB, ## 'UK PDB mirror',
## 'Japan PDB mirror' ===Downloadable programs=== # BLAST: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/ (local: DOS) # CLUSTALW: ftp://ftp.ebi.ac.uk/pub/software/dos/clustalw/ (local: DOS[WindowsXP]) # MODELLER: http://salilab.org/modeller/ (local: DOS) # RasMol: http://www.openrasmol.org/ (local: Windows) # Perl: http://www.perl.org/ (local: DOS[WindowsXP]) # Other utilities: ALZip.exe (for unzip, untar, and ungzip)

===Solution Example ===
# Found templates (PDB format) ## Sample protein A: 1QJA.pdb ## Sample protein B: 1N4C.pdb --------------------------------------------------------------------------------
# Multiple sequence alignments (PIR format)
## Sample protein A: KCIP.pir
## Sample protein B: GAKH.pir.ORG -------------------------------------------------------------------------------- * MODELLER input files ** Sample protein A: *** ATM, *** ALI, *** TOP files (automatically generated) ** Sample protein B: *** ATM, *** ALI, *** TOP files (manually edited; C-terminal only) * MODELLER output files ** Sample protein A: KCIP.pdb ** Sample protein B: GAKH.pdb