I went to one of the sites that has the new sequence information (based on 'crowdsourcing' from various labs) on O104:H4 (I used the oh no sequences blog -- the blog for the R&D section of era7 bioinformatics) and found the identifier code for the EprK protein (here's the link). Some of the data has been annotated based on sequence homology and EprK was one that has been identified. Using this code, I found the DNA sequence and copied it to the clipboard. Then I went to the NCBI website (link) and pasted the DNA sequence into the search box to do a BLAST search of all microbial genomes that have been sequenced. There were dozens of hits, and nearly all of them were EprK proteins from various strains. I found the O157:H7 strain and the alignment is impressive. More than 95% of the DNA bases are identical between the two, suggesting that the two proteins are very similar. I've included the BLAST results of my search below using O104:H4 EprK (Query, top strand) and it's alignment with O157:H7 EprK (bottom strand). So, your drug probably works on the new strain too. If you want the amino acid sequence of the O104:H4 strain, simply take the DNA sequence to ExPaSy (link) and translate it. It actually took me a bit to get the protein sequence because there is a frameshift mutation in the O104:H4 sequence read. If you scroll down to my alignment and find the part highlighted in red, you will see there is an extra adenosine (an 'A' base) in the O104 sequence. This throws off the protein translation. I assume it is a mis-read in the O104 sequence (a common mistake when the sequencing machine reads through a string of the same base) and deleted it when I translated from DNA to protein. The resulting amino acid sequence (pasted below) is very similar to EprK from other EHEC strains. I'll double check this and follow up with them.
Anyhow, I don't think there is a structure for the EprK protein, but if there was, you could use the existing structure as a model and make the amino acid changes seen in the O104:H4 strain to give you a decent starting point for the structure-based design of new drugs.
Find a pathogenic protein of interest and try this yourself... it's not too hard. When the topic of EHEC comes up at the next party, you can impress your friends by saying you blasted several virulence factors and found them to be quite similar/different from strains of previous outbreaks. I would do this myself but, oddly enough, I don't get invited to parties anymore. Anyhow, as a final disclaimer... although I have tried to be careful please verify anything I have posted before use.
Query 1 GTTGAGGATGAATATAACTAATTGGATCATATATAATCTTTCTTAGGGCAAGATTCATAA
|||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||
Sbjct 443403 GTTGAGGATGAATATAACTAATTGGAGCATATATAATCTTTCTTAGGGCAAGATTCATAA
Query 61 CGCTCTCATATGTCTACTTAATTTTCAACCTGACTAAATTAGTTAGAATGGCCCTATACT
|| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 443343 CGTTCTCATATGTCTACTTAATTTTCAACCTGACTAAATTAGTTAGAATGGCCCTATACT 443284
Query 121 TCCATAACAGCCAGCAAGTCGCTACGGATATTAATGCAAGTAAGATAGAAACCGGCATAG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 443283 TCCATAACAGCCAGCAAGTCGCTACGGATATTAATGCAAGTAAGATAGAAACCGGCATAG 443224
Query 181 CCTTATCATAAGCAAAAACAGGTTCGCTAATTTCATATGTTGGTGCTTGCTCAATAATGT
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 443223 CCTTATCATAAGCAAAAACAGGTTCGCTAATTTCATATGTTGGTGCTTGCTCAATAATGT 443164
Query 241 CTCTTCGTTTTGACAATACAACAGAAATATTTTCATATTGTACGCTTGCAGAGCTATTAA
||||||||||||||||||||||||||||||| |||||||||||||||||| | |||||||
Sbjct 443163 CTCTTCGTTTTGACAATACAACAGAAATATTCTCATATTGTACGCTTGCAAAACTATTAA 443104
Query 301 CAATAAATCTCTTGATATCATTTATTTTTATTTCTGGGTTGATATCTTTTTCATATACTG
||||||||||||| || |||||||||||||||||||| ||||||||||||||||||||||
Sbjct 443103 CAATAAATCTCTTTATGTCATTTATTTTTATTTCTGGATTGATATCTTTTTCATATACTG 443044
Query 361 CAAGTACAGAAATATGAATTGGTAAAGCAGTTTTACCACTATCGCCATTATCAACATCGT
||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||
Sbjct 443043 CAAGTACAGAAATATGAATTGGTAAAGCAGTTTTACCACTATCGCCAGTATCAACATCGT 442984
Query 421 AACTAACATGTACTCTCGAAGAAATAATGCCATCCATAATTTTGAGAGATTGCTCTAACC
|||||||||||||||||||||||| ||| |||||||||||||||||||||||||||||||
Sbjct 442983 AACTAACATGTACTCTCGAAGAAACAATACCATCCATAATTTTGAGAGATTGCTCTAACC 442924
Query 481 GCTGCTCAATAGCAGAATATAGCCTTGCTTTTTCCGCTCGTGGAGATGAAAACGAGTGCA
||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||
Sbjct 442923 GCTGCTCAATAGCAGAATATAGCCTTGCTTTTTCCGCTCGTGGAGATGAAA-CGAGTGCA 442865
Query 541 TCTGCAGGGAACATCTGCGATATTTGAATATCAGGCTTACCCGGTAGATTGTAGATTTTT
|||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||
Sbjct 442864 TCTGCAGGGAACATCTGCGATATTTGAATATCAGGCTTACCCGGGAGATTGTAGATTTTT 442805
Query 601 AGCCAATCCACCGCAGAAGCAAAATCCGTTGGTTCGACAAATATTGAAAATCCTGTTTTG
||||||||||||||||||||||||||||||||||| ||| | ||||| ||||| ||||||
Sbjct 442804 AGCCAATCCACCGCAGAAGCAAAATCCGTTGGTTCAACATAGATTGAGAATCCAGTTTTG 442745
Query 661 CCTTGATCCTTCTTTTCAGCATTAATATTATGTCTTTGTAAAACAGCAAGGACATCATTA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 442744 CCTTGATCCTTCTTTTCAGCATTAATATTATGTCTTTGTAAAACAGCAAGGACATCATTA 442685
Query 721 GCTTGCTGTTGATCAAGATGGTTCAATAATTCCTGCTGCTTGCAGCCGCACAACAGCAGG
||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||
Sbjct 442684 GCTTGCTGTTGATCAAGATGGTTCAGTAATTCCTGCTGCTTGCAGCCGCACAACAGCAGG 442625
Query 781 ATAAACAATAATA 793
|||||||||||||
Sbjct 442624 ATAAACAATAATA 442612
Predicted amino acid sequence for O104:H4 EprK protein, (corrected for gap):
L L F I L L L C G C K Q Q E L L N H L D Q Q Q A N D V L A V L Q R H N I N A E K K D Q G K T G F S I F V E P T D F A S A V D W L K I Y N L P G K P D I Q I S Q M F P A D A L V S S P R A E K A R L Y S A I E Q R L E Q S L K I M D G I I S S R V H V S Y D V D N G D S G K T A L P I H I S V L A V Y E K D I N P E I K I N D I K R F I V N S S A S V Q Y E N I S V V L S K R R D I I E Q A P T Y E I S E P V F A Y D K A M P V S I L L A L I S V A T C W L L W K Y R A I L T N L V R L K I K
Predicted amino acid sequence for O104:H4 EprK protein, (corrected for gap):
L L F I L L L C G C K Q Q E L L N H L D Q Q Q A N D V L A V L Q R H N I N A E K K D Q G K T G F S I F V E P T D F A S A V D W L K I Y N L P G K P D I Q I S Q M F P A D A L V S S P R A E K A R L Y S A I E Q R L E Q S L K I M D G I I S S R V H V S Y D V D N G D S G K T A L P I H I S V L A V Y E K D I N P E I K I N D I K R F I V N S S A S V Q Y E N I S V V L S K R R D I I E Q A P T Y E I S E P V F A Y D K A M P V S I L L A L I S V A T C W L L W K Y R A I L T N L V R L K I K
0 comments:
Post a Comment