2024年1月21日发(作者:)

蛋白质内含子的特征序列分析及模体修正谢 君  黄京飞 石秀凡 刘次全*(中国科学院昆明动物研究所, 昆明 650223. *

联系人, Email: xiesangjun@)摘要 自从1990年在Saccharomydes cerevisiae 腺苷三磷酸核苷酸酶中发现第1个蛋白质内含子(Sce VMA)后, 蛋白质内含子的数量在不断增多. 分析这些新的蛋白质内含子序列, 对正确认识它们的序列特征是非常必要的. 通过系统搜索核酸以及蛋白质序列数据库, 收集到蛋白质内含子101个, 其中含LAGLI-DADG 自导引核酸内切酶模体的蛋白质内含子序列69个. 由于典型蛋白质内含子是包含自导引核酸内切酶模体的, 而且占蛋白质内含子的绝大多数, 所以只分析这69个典型蛋白质内含子. 发现这些蛋白质内含子的分布在物种以及蛋白质种类之间都有其特殊性.通过多序列联配还发现了蛋白质内含子在序列上的一些新特征, 并对已有的蛋白质内含子的模体进行修正.2525

第45卷 第23期 2000年12月简 报关键词 蛋白质内含子 分布 插入位点 序列特征蛋白质内含子(intein)(以下简称内含子)是蛋白质中的一段多肽链, 它靠自我剪切的方式从前体蛋白中分离出来, 而两端的外显子以肽键的方式相连[1]. 大部分内含子含有2个LAGLI- DADG (字母代表氨基酸残基)特征序列的模体(motif), LAGLI-DADG 模体是定义自导引核酸内切酶的(homing endonuclease). 这种酶在蛋白质内含子和自我剪切的RNA内含子中都有发现, 它也是一种DNA核酸内切酶, 识别和剪切等位基因上蛋白质内含子或RNA内含子的插入位点, 而启动蛋白质内含子或RNA内含子基因的转移[2].自从在Saccharomydes cerevisiae ATPase中发现第1个内含子后[3,4], 内含子的数目在不断增加, 但这些内含子大部分是从基因序列推测出来的. 我们用BLAST服务器(. /BLAST/)中的TBLASTN(蛋白质序列对核苷酸序列搜索)程序搜索NR数据库(综合了GenBank[5], DDBJ[6], PDB[7], EMBL [8]数据库并剔除其中的重复序列而成的)得到69个含LAGLI-DADG 模体的内含子序列. 系统分析这69个内含子发现其在分布及插入位点等方面的一些特征, 与此同时还发现内含子序列的一些新特征并对先前定义的内含子序列模体进行了修正, 从而为正确认识内含子的序列间隙惩罚为10, 间隙扩展惩罚为0.05, 相似性小于20%的序列延迟, 权重介质选GONNET系列, 间隙扩展惩罚用缺省值2526

简 报第45卷 第23期 2000年12月

从这69个内含子看, 生物三大系统(真核, 真细菌, 古细菌)中都存在内含子, 且真核生物的细胞核液泡三磷酸腺苷酶等代谢酶中也有分布, 但50%以上是分布在与DNA复制转录功能和作用方式都非常相似[11].分析表明, 这69个内含子的插入位点的前端氨基酸序列没有明显的特征. 后端除了第1个氨基酸总是Cys, Thr或Ser外, 其他也没有什么共同的特征. 这与内含子的自我剪切信息都包含在内含子本身和下游外显子第1个氨基残基处是相符的[11~16].2.2 内含子的模体修正我们对这69个典型内含子进行多序列联配, 发现先前定义的内含子模体有些不太准确.与内含子数据库比较, 表1中的模体有下列一些主要改变: Block A末端增加了1个氨基酸残基, Block G从原来的8个氨基残基增加到13个残基. 并增加了另外1个新的模体: Block2527

第45卷 第23期 2000年12月简 报表1  蛋白质内含子中的保守序列编号 名称a)Eucarya1 Ceu ClpP2 CIV RIR13 Ctr VMA4 Sce VMACLTSDHTVLTTR*G 13b)IPIADVTL 22 GVDLFVTPNHRMYV 73 FFGLWIANG 151 NKYLPDWV 230 ---------- STSERFANDVSRLALHAGT 281 PVYCLTGP 440 VQRNGKAVWTGNS 457CVAPETMILTED*G 13 FPKDLEGK 23 GCTLSCTPEHKFII 68 THGFFCGDG 131 -------- WLAGYLDADG 204 CIHLDFLKRIQLLLIGMGV 237 RVDDTYCF 323 HAGVFNGILTGQC 340CFTKGTQVMMAD*G 13 IEVGNKVM 28 LMDFTVSADHKLIL 78 LLGTWAGIG 210 VKSIPQHI 325 LIAGLVDAAG 345 TSFRHVARGLVKIAHSLGI 379 YYGITLAE 455 QFLLSNMALVHNC 472CFAKGTNVLMAD*G 13 IEVGNKVM 28 LLKFTCNATHELVV 83 LLGLWIGDG 219 VKNIPSFL 307 FLAGLIDSDG 327 TIHTSVRDGLVSLARSLGL 359 YYGITLSD

437 QFLLANQVVVHNC 455Eubacteria5 Aae RIR26 Dra RIR17 Mfl GyrACFIEGTEVLTKR*G 13 VDFRELRE 22 NWEVVATEGHEFIV 75 LLIVLQADG 149 TKFFDEWV 223 FVEELVKWDG 245 STKEKRNKDFVQALCALGG 279 KVYCVSVP 329 VRYKDSVCIAGNC 347CVTADTWVSTAF*G 13 RQVQDLIG 22 GYELRLTGNHQLLK 69 LLGSLIGDG 124 HKTLTDKV 202 VLQGLFDADG 222 SDLSLLKRAQRMLSRLGIM 257 AVYDVTVE 302 HAFDANGVLAHNC 367CVTGDALVRLPF*G 13 VRLRDVVA 23 GYEVTGTANHPLLC 79 HLGAFISEG 134 DKAVPEWL 212 FLQALFEGDG 232 TRSGRLAKDIQQMLLEFGV 266 PVYSLRVD 405 HSFLTNGFVSHNT 4228 Mgo GyrA9 Mka GyrA10 Mle GyrA11 Mle Pps112 Mle RecA13 Mma GyrA14 Mtu DnaB15 Mtu Pps116 Mtu RecA17 Rma DnaB18 Spb RIR119 Ssp DnaBArchaea20 Mja GF-6P21 Mja Helicase22 Mja Hyp-123 Mja Hyp-224 Mja IF225 Mja PEP26 Mja Pol-127 Mja Pol-228 Mja r-Gyr29 Mja RFC-130 Mja RFC-231 Mja RFC-332 Mja RNR-133 Mja RNR-234 Mja Rpol ACLTGDALVRLPF*G 13 MRIGDVAP 23 GYEVTGTSNHPVLC 79 LLGAFISEG 134 DKSVPEWL 212 FLQALFEGDG 232 TRSRQLAIDVQQMLLEFGV 266 PVYSLRVD 405 HAFLTNGFVSHNT 421CVTGDALVRLPF*G 13 MRIADVVP 23 GYEVTGTANHPLLC 79 LLGAFISEG 134 DTYVPEWM 212 FLQALFEGDG 232 TVSKQLAMDVQQMLLEFGV 266 PVYSLRVD 405 HAFLTNGFVSHNT 421CVSGNSLVRLLF*G 13 IRIGDIVT 23 GYEITGTSNHPLLC 79 LFGAFISGG 134 DKLVPDWL 212 FLQALFEGEG 232 TLSERLAADVQQMLLEFGV 266 PVFSLHVD 405 HSFLTNGFISHNT 421CLTADARINVKGKG 14 VSIADVQP 23 GRALEATGNHQFLV 72 LLGLWLGDG 151 TKRLPAWI 225 LIGGLVDADG 245 FASRELLEDVRQLAIGCGL 276 PTYDIQVV 372 ENFVANGIVAHNS 387CMNYSTRVTLAD*G 13 EKIGKIVN 23 GKSQFAATPNHLIR 83 VLGSLMGDG 123 -------- LQRAVYLGDG 194 FSLEEYLKALTPLVLAIWY 215 NRFDIEVE 351

HNYFVDGVMVHNS 366CCTGDALVRLPF*G 13 VRIGNFVP 23 GYEVTGTSNHPLLC 79 LLGAFISEG 134 DKSVPDWL 212 FLQALFEGGG 232 TRSRQLAVDVQQMLLEFGI 266 PVYSLRVD 405 HAFLTNGFVSHNT 421CLTASTRILRADTG 14 VAFGELMR 24 GREVEATGSHPFMK 77 SLARMIGDG 123 EKCVPEAV 206 FLRHLWSAGG 226 STSRRLIDDVAQLLLRVGI 261 HVFDGTVS 402 HNFVANGISLHNS 417CLPAGELITTAD*G 13 IRVGDFVT 28 ANAFSVTAEHPLLA 71 LAGYYLAEG 144 NKKLSDLL 223 LVDAYVNGDG 243 TTSRLWAFQLQSILARLGH 276 PVYNLDVE 345 DSYLAYGFAVHNC 360CLAEGTRIFDPVTG 14 HRIEDVVD 24 GAIVWATPDHKVLT 77 LLGYLIGDG 123 EKTIPNWF 201 LLFGLFESDG 223 TTSEQLAHQIHWLLLRFGV 257 RTFDLEVE

426 HTLVAEGVVVHNC 441CLAGDTLITLAD*G 13 VPIRELVS 23 GRSIRATANHRFLT 76 LLGHLIGDG 121 EKKVPALL 206 FLRHLWATDG 226 TSSYQLARDVQSLLLRLGI 263 EVFDLTVP 414 HNFVANDIIAHNS 429CVTGETLLLTEN*G 13 EKAADLYK 22 GYEIRATEWHKFYV 83 IMGIIAGDG 131 KTRVPEFI 219 YLSGLFQTDG 239 SIHYESLQDVQKLLLNMGV 274 DVYDTTQE 371 DHSLFNGIVTGNC 386CISGDSLISLASTG 14 VSIKDLLD 24 GRTIKATANHRFLT 77 LLGHLIGDG 122 EKFVPNQV 207 FLRHLWSTDG 227 TSSEKLAKDVQSLLLKLGI 263 EVFDLTVP 415 HNFVANDIIVHNS 430CLHPDTYVILPD*G 13 KKISEIDE 23 FSELITTGEHKLFV 72 IIGYIIGDG 215 NERTPEFV 291 YLRGIFDAEG 311 MTSKCFIKEIQFLLLRFGI 342 YVYDLEVE 416 HNFIGNLIINHNS 500CLNANTEILQES*G 13 RKITELNK 22 GLEITTTPNHIFLV 72 FIGYFIGDG 122 NKNIDAFC 197 LIAGLFDSDG 216 SISEKLVEQLQFVLLRFGI 247 YAYDIELP 483 HYIVANGFIVHNS 502CVPPDTLLILEN*G 13 IKVGDKVL 27 PEEIILTPEHPVYA 70 --------- RSRIPEKI 156 RLVGYFLSEG 173 TTSEILMNQLRLISLRLGF 293 FVYNLEVE 378 SSYVTVSGTLHNC 393CLTSNSKILTDD*G 13 IKLEKLKE 23 GRVLEGSKDHPVLT 76 LLGFAFGDG 156 IYKIPEWI 254 FLAGLFGADG 274 ENILEFLNEIKLLLAEFDI 317 KLYDVGIV 473 HNFIANSIVVHNC 489CLMPHEKVLTEY*G 13 IKIEDLFK 22 WHSITTTPEHPFLT 90 FAGVMFGDG 215 NIKIPQIL 286 FIKGYFDADG 306 SASKEFIEGLSILLLRFEI 337 YVYDLTTE 533

Block A          Block I      Block B           Block C       Block D      Block E            Block H            Block F         Block G     ANGFLHNFIANGIVVHNT 547CIEGDAKILTDR*G 13 LKMKEVYK 22 KDTIKITPDHKFPV 83 LGGAVLSDG 134 SRKIPSEI 196 LIAGFVDGDG 228 SSHIKKIEGLIVGLYRLGI 260 EVYNITVK 389

WTKHYTPIVVFNC 413CHPKGTKVVVKGKG 14 VKEGNYVL 28 VNGLKCTPNHKIPL 64 LIGILLAEG 135 -------- ILRGFFEGDG 235 TNNYDKIKFIASLLDRLGI 268 EVYDLTLE 355 PYYFANGILTHNS 370SILPDEYLTIIEED 14 VKIGEYID 26 GRTIKVTRGHSLFK 100 FLGFFVTRG 289 KKHIPEEL 364 ---------- AKDEKYLNQLMILFNLVGI 413 YVYDLSVE 462 ENFLVNNIYAHNS 477CLTPDTYVVLGD*G 13 VNAKERNV 29 NYELKATPDHCLLV 77 FAGLVLGDG 212 IFSLPESY 281 LIAGYFDTDG 294 SKRRDVLEKIGIYLNSIGI 333 YVYDLSIK 480 QNFISNGVISHNC 495CLTGDTKVIVNGEI 14 REIGEVIE 22 GRELKVTTYHPLLI 85 WLGYFIGDG 134 KVRIPKEI 211 FLRAYFDCDG 229 TASKEMAEDLVYALLRFGI 258 TIYDLHVP 532 FIGGNLPTILHNT 549SVSKDTPILVKIDG 14 TTFEELDK 26 GGYIELTGNHSIMM 91 MLGLYVAEG 162 NKRIPDII 251 FLKGLADGDS 271 SKSDNLLIDTVWLARISGI 301 FVYDVSVP 420 FFAGNVPILLHNS 437CLTGDAKITLPDER 14 IKIEDFIK 23 GREIEATGDHKFLT 78 LLGFIIGDG 232 GYNIPKWI 321 FLRGLFGADG 341 DKTLEFFEEVKKMLEEFEV 384 DVYDITCH 529 PSFIANGFVSHNC 544SLGRDELIFIKE*G 13 CKIGEAID 26 GTSIIVTEDHSLFN 103 FLGLFVAEG 178 NKNSPEFI 247 FLGGLISGDG 267 TTSEQLLGQLHLLLSDLGM 297 YVYDLSVE 438

NFITATGILCHNT 454SLPYDEKILIFENN 14 VKIGEFVE 26 GKKVRVTGDHSVFT 101 LIGAFLSEG 236 NKEIPSIL 314 LIKGYIDGDG 333 TTSETLRDTLCLALKILGI 368 YVYDISVE 517 FIGGEGFICLHNT 534SVDYAGFGGCFHPLLITSYCLAP21 HNF2528

简 报第45卷 第23期 2000年12月

CLLPDEKVVVPSVG 14 VTLKELFE 23 WHGVTVTPEHPFLT 91 LAGRKGNID 149 -------- FLRGYFEERS 173 VEARELVEPLSLALLRFGI 200 YVYDLTTE 381 HNFIANGILVHNT 395CFSGEETVVIRENG 14 LRLKDFVE 26 DYWFALTPDHKVYT 98 LMGILFNGG 158 DLKMPWWV 231 PSLFLAFLEG 244 NKNLPFFQELSWYLGLFGI 276 VTYNLTTE 320 GNLLANGLFVKNS 334CFPPTEEAVFKF*G 13 VKIATFEE 23 GRTIRTTPEHKFFV 97 LIGYFVSDG 227 VFKIPEGA 291 LLSGLFNGDG 320 STSKGLIRDILYLLASLGI 353 YVYDLEVE 439

SFVAADGILVSNC 456CLTGDAKVIANGEL 14 TTIGELVE 22 GRELKVTPYHPLLV 85 WLGYFFGNG 133 KDSIPEQA 196 FLRAYFDCNA 212 TAGKEIAEQISYALAGLGI 240 VIYDLHVP

483 FIGGNLPTVLHNT 500CVTGDTKVYTPDER 14 VKIRDFMN 23 GRIIEATGDHMFLT 78 VVGFILGDG 301 GYTVPEWI 388 FLRGLFGADG 408 ERTVEFLNDVADLLREFDV 451 DVYDVTCA 594 HSFISNGLVSHNC 609CIDGNAKIIFENEG 14 TTMAEMYE 26 GTKILTSPWHPFFV 103 LAGFIAGNG 150 ENGIPPKI 176 FITGLFDAEG 196 MVNKKLIEAVTHYLNSLGI 226 TFYDLTVD 382 LAGENGMIFVHNT 400CFTGDTRILTEK*G 13 IPEEIVHE 23 GYELKVTGDHKFLV 71 LLGIIYADG 141 NIRVPEAI 227 FLAGFFDGDG 247 SISREFIKEAQLLFLALGI 277 HVYDLQVE 424

HLLTGNGIYTSNS 439CVVGETRILTPE*G 13 LKAEELFK 22 DKQVKYETVHGNAV 69 VLGWFIGDG 157 EKRIPEIV 230 FLRGLFSADG 250 SKSRELLREVQDLLLLFGI 280 LVYDFTVP 368 HSYISNGFISHNC 383CVDGDTLVLTKEFG 14 IKIKDLYK 23 GRKIKVTPIHKLFT 93 FLGYLIADG 164 KLGVPRNK 227 FIKAYIMCDG 258 TASEEAAYGFSYLLAKLFI 289 EVYDVTTE 415 FIGGNMPTLLHNT 430CIDGKAKIIFENEG 14 TTMEEMYE 26 GTKILTSPWHPFFV 103 LAGFIAGDG 150 DNGIPPQI 231 FIAGLFDAEG 251 MVNKRLIEDVTHYLNALGI 281 TFYDLTVE 437 LAGENGMIFVHNT 455CVVGDTRILTPE*G 13 LKAEEIFS 22 EEKVEYETVHGKVL 69 VLGWFIGDG 157 EKRIPEIV 230 FLRGLFSADG 250 SKSRELLREVQDLLLLFGI 280 IVYDFTVP 368 HMYISNGFMSHNC 383CLHPDTLILTSQ*G 13 RKIKELSR 22 GTELYLTADHPVLV 72 LIGYLAGKG 176 -------- FLAGYYDATL 256 GLTLEALYKIKVYLQLLGI 277 KVYDLVVE 359 HNFIANGIAVHNC 374CLAPGTRVLTEH*G 13 LKIEEMPE 23 GKVIEGSEDHPVLT 80 ILGFALANG 159 HDSIPEWI 223 FLAGLFGANG 243 THSRELLNDVSRILEGFKV 277 KFYDIGVY 375 HNFIANGIVVHNC 391CLLPEERVILPDYG 14 ITLEELFN 23 WHSVAVTPEHPFLT 91 FAGTIFGRE 159 -------- FLRGFFDING 212 GAPHEVLEELSLILLRLGI 243 YVYDVTTE 431 HNFIANGILVHNT 445ALYDFSIIQLSN*G 13 VLIGDLVE 23 GNEVILTRSHPLFA 101 LAGVILGDG 202 EWDVPDIV 279 FIAGLFDADG 298 TKSENVARKIWYALQRLGI 327 YVYDLTVE 506 HNYVANGILVSNC 520CVSGDSKVLTEK*G 13 IRHLNSGM 24 GFEVKCTREHRFLT 71 LLGFWMASG 214 KLEVPPII 283 FLAGYFDGNG 303 AFNRKFAEGIRDILLQLGI 337 PVYLDVAY

456 GNYVVNGFVSKNS 476CFSGEEVIIVEK*G 13 VKLREFVE 26 LDYWLAVTPDHKVF 98 ILGALFSDG 215 KLELPWWI 297 FMDGLYSGDG 316 EKKLPFFEEIAWYLSFFGI 362 VTYNVTTE

461 GNLLANGLFVKNS 475SILPDEWLPIVENE 14 VKIGDFID 26 GRRIKITSGHSLFS 100 LLGYYISSG 289 GKRIPEFI 351 FLKGLNGNAE 371 TKSELLVNQLILLLNSIGV 395 YVYDLSVE 444 FLVGFGLLYAHNS 461CVTPDTLVSLSD*G 13 VENSEESL 29 GHEIKATPDHGLLV 75 GKGTLKGDK 168 -------- MIAGYFDASG 241 SKRGDILRMLSVYLYQIGI 269 KLYDFTTT 397 ENFISNGIVSHNC 411CLTGDTKVIAN**G 12 FELRELVE 22 GRELKVTPYHPLLV 85 WLGYFLGGG 132 NAHIPKEC 202 FLRAYFDCNG 218 TASKEMSQEIAYALAGFGI 247 IIYDLHVP

509 FIAGNMPTVVHNT 526CVVGDTRILTPE*G 13 LKIEDLFR 22 GYEIIATLDHKIMT 113 VLGWLIGDG 160 DKRVPEIV 233 FLRGLFTADG 253 SKSRELLRDVQDLLLLFGI 283 IVYDLTVP 371 HMYISNGFISHNC 386CVDGDTLVLTKEFG 14 IKIKELYE 23 GRKIKVTPIHRLFT 93 FLGYLMANG 160 RLGVPEDK 224 ---------- LASEEGAYELSYLFAKLGI 278 EVYDVTTE 361 FIGGNMPTLLHNT 377SILPEEWVPLIKNG 14 FRIGDFVD 26 GRKITITEGHSLFV 100 LLGYYVSEG 289 NKRVPEVI 364 FLEGYFIGDG 384 TKSELLVNGLVLLLNSLGV 414 YVYDLSVD 521 FLAGFGFLYAHNS 538CHPADTKVVVKGKG 14 VQEGDYVL 28 INGLKCTPNHKLPV 64 LAGILLAEG 126 VKEIMDNI 209 VLRGFFEGDG 226 TKNEWKIKLVSKLLSQLGI 259 KVYDLTLE 346 PYYFANGILTHNS 361SILPEEWLPVLEEG 14 VRIGELID 26 GRRIKITSGHSLFS 100 LLGYYVSEG 289 NKRIPEFV 364 FLEGYFIGDG 384 TKSELLANQLVLLLNSVGV 413 YVYDLSVE 520 FLVGFGLVYAHNS 537CHPADTKVIVKGKG 14 ISEVREGD 25 INGLKCTPNHKLPV 64 LAGIILAEG 126 VREIMDGI 209 VLRGFFEGDG 224 TNNEWKIEVVSKLLNKLGI 259 KVYDLTLE

346 PYYFANGILTHNS 361SVTGDTEVTIRRNG 14 VPIEKLFE 26 SWYLDVTEDHSLIG 92 LIGLLVGDG 155 NKAIPSFM 233 FLRGLFSADG 253 SNSLFTETKPNRYLEKESG 305 YVYDIEVE 375 HRFFANGILVHNT 390SILPNEWLPIIENG 14 VKIGEFIN 26 GRKINITAGHSLFT 100 LLGYYVSEG 290 NKRIPSVI 365 FLEAYFTGDG 385 TKSELLANQLVFLLNSLGI 415 YVYDLSVE 522

NFLGFGLLYAHNS 539SVSGESEIIIRQNG 14 VKIKDLFS 26 SWYIDVTEDHSLIG 92 LVGLIVGDG 156 RRKIPEFM 234 FLRGLFSADG 254 NIDADFLREVRKLLWIVGI 287 YVYDIEVE 376 HRFFANNILVHNT 391CHPADTKVIVKGKG 14 VKEGDYIL 28 INGLKCTPNHKVPV 64 LSGIILAEG 126 LKNIESLY 213 VLRGFFERDA 226 TNNKWKIDIVAKLLDSLGI 259 EVYDLTLE 346 PYYFANGILTHNS 361SILPNEWLPIIENG 14 VKIGEFID 26 GRKIHITRGHSLFT 100 FLGYYVSEG 290 NKRIPSII 365 FLRAYFVGDG 385 TKSELLANQLVFLLNSLGV 415 YVYDLSVE 522

G G.... LhG..hhaG ...h .G p.S..hh..h..LL..hGI rVYDLpVa hhHNp39 Pab IF240 Pab Lon41 Pab Moaa42 Pab RFC-143 Pab RFC-244 Pab RIR1-145 Pab RIR1-246 Pab RIR1-347 Pab VMA

48 Pfu RIR1-1

49 Pfu RIR1-2

50 Pfu TopA51 Pho Hyp-2

52 Pho IF2

53 Pho KlbA

54 Pho LHR

55 Pho Lon

56 Pho Pol I

57 Pho r-Gyr58 Pho RFC

59 Pho RIR

60 Pho VMA61 Psp GBD Pol

62 Psp KOD Pol-163 PsP KOD Pol-264 Tfu Pol-165 Tfu Pol-266 Tli Pol-1

67 Tli Pol-2

68 Tsp TY Pol-169 Tsp TY Pol-2序列一致性c)   a) 蛋白质内含子的名称参考文献[1]; b) 序列右边的数字表示每一模体的最后一个氨基酸残基的位置; c) 序列一致性中的符号意义: 大写字母表示保守氨基酸残基(标准的单字母表示方式), p为极性残基, h为疏水氨基酸残基, a为酸性残基, b为碱性残基, r为芳香残基提供羟基或疏基, 通过酰基转移, 将上游外显子接过来[11~16]. 从表1可以看出, 这一残基也比较保守, 目前只发现Thr, Ser, Cys 3种残基位于此位置. 中间两个残基Gln, Gly居多, 而这两个残基的两边分别是3个疏水氨基酸残基, 第1个残基是比较保守的His残基. 以前定义的Block A只有8个氨基酸残基[1], 但从这69个内含子来看, 相对于其他Block, 前面5个氨基酸残基保守性也不低.跟内含子数据库相比, Block2529

第45卷 第23期 2000年12月简 报子的第1, 2个核苷酸都是G, A. 在三维结构上, 这个残基靠近剪切反应位点, 在自我剪切的反应过程中起什么作用还有待进一步考察. 在Block²åÈëλµãµÈ·½ÃæµÄÌØÕ÷; (3) 通过多序列联配, 发现了一新的模体(Block½á¹¹Ó빦ÄÜÌṩÁËеÄÐÅÏ¢.致谢 本工作为国家自然科学基金资助项目(批准号: 39770418, 39570395).参 考 文 献1 Perler F B, Olsen G J, Adam E. Compilation and analysis of intein sequences. Nucleic Acids Res, 1997, 25(6): 1087~10932 Cooper A A, Stevens T H. Protein splicing: self-splicing of genetically mobile elements at the protein level. Trends BiochemSci, 1995, 20(9): 351~3563 Hirata R, Ohsumi Y, Nakano A, et al. Molecular structure of a gene, VMA1, encoding the catalytic subunit of H+-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. J Biol Chem, 1990, 265:6726~67334 Kane P M, Yamashiro C T, Wolczyk D F, et al. Protein splicing converts the yeast TFP1 gene product to the 69-kD subunitof the vacuolar H+-adenosine triphosphatase. Science, 1990, 250: 651~6575 Benson D A, Boguski M S, Lipman D J, et al. GenBank. Nucleic Acids Research, 1999, 27(1): 12~176 Yoshio T, Satoru M, Motonori O, et al. DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams. NuclAcids Res, 2000, 28: 24~267 Helen M B, Westbrook J, Feng Z, et al. The protein data bank. Nucl Acids Res, 2000, 28: 235~2428 Wendy B, Broek A, Camon E, et al. The EMBL nucleotide sequence database. Nucl Acids Res, 2000, 28: 19~239 Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic AcidsRes, 1999, 27: 49~5410 Thompson J D, Gibson T J, Plewniak F, et al. The clustalX windows interface: flexible strategies for multiple sequencealignment aided by quality analysis tools. Nucleic Acids Research, 1997, 24: 4876~488211 Derbtsgure V, Belfort M. Lightning strikes twice: intron-intein coincidence. Proc Natl Acad Sci USA, 1998, 95(4):1356~135712 Shao Y, Xu M Q, Paulus H. Protein splicing: characterization of the aminosuccinimide residue at the carboxyl terminus ofthe excised intervening sequence. Biochemistry, 1995, 34: 10844~1085013 Shao Y, Xu M Q, Paulus H. Protein splicing: evidence for an N-O acyl rearrangement as the initial step in the splicingprocess. Biochemistry, 1996, 35: 3810~381514 Chong S, Shao Y, Paulus H, et al. Protein splicing involving the Saccharomyces cerevisiae VMA intein. J Biol Chem, 1996,271: 22159~221682530

简 报第45卷 第23期 2000年12月

15 He Z, Crist M, Yen H, et al. Amino acid residues in both the protein splicing and endonuclease domains of the PI-SceI inteinmediate DNA binding. J Biol Chem, 1998, 273(8): 4607~461516 Telenti A, Southworth M, Alcaide F, et al. The mycobacterium xenopi GyrA protein splicing element: characterization of aminimal intein. J Bacteriol, 1997, 179(20): 6378~638217 Xu M, Comb D G, Paulus H, et al. Protein splicing: an analysis of the branched intermediate and its resolution bysuccinimide formation. EMBO J, 1994, 13: 5517~5522(2000-06-06收稿, 2000-08-18收修改稿)