The Dark Laboratory: June 2011

Saturday, June 25, 2011

Preliminary functional annotation of O104:H4 genes/proteins by Era7 Bioinformatics

Just a quick follow-up to my last post... here is a pdf of a paper (link) from the Oh No Sequences group at Era7 Bioinformatics that lists the full functional annotation of O104:H4. It represents an amazing amount of work and is a great reference for anyone studying the O104:H4 strain. Major kudos to the ONS group!!

Monday, June 20, 2011

Using O104:H4 EHEC data... an example

I've had a few requests for an example of how to work with the new EHEC data. I agree it can be very overwhelming to have hundreds of Megabytes of genomic data, so here is a fairly simple example of what one might do and what you might encounter. Suppose you had a drug (antibody, peptide, small molecule) and you knew it hit a protein called EprK. EprK is an approximately 250 amino acid protein that is part of the Type III Secretion System (T3SS). The T3SS is the cell-surface protein complex that attaches the pathogenic bacteria to the host cells. Blocking proteins like EprK is one possible way to prevent EHEC pathogens from attacking normal cells and causing disease. Your drug works on other EHEC strains (such as O157:H7, the strain responsible for the 2006 outbreak in the US) but will it work on O104:H4? Testing it directly is the best way to know, but obtaining the new strain is likely to be very difficult. Another option is to go to the sequence data.

I went to one of the sites that has the new sequence information (based on 'crowdsourcing' from various labs) on O104:H4 (I used the oh no sequences blog -- the blog for the R&D section of era7 bioinformatics) and found the identifier code for the EprK protein (here's the link). Some of the data has been annotated based on sequence homology and EprK was one that has been identified. Using this code, I found the DNA sequence and copied it to the clipboard. Then I went to the NCBI website (link) and pasted the DNA sequence into the search box to do a BLAST search of all microbial genomes that have been sequenced. There were dozens of hits, and nearly all of them were EprK proteins from various strains. I found the O157:H7 strain and the alignment is impressive. More than 95% of the DNA bases are identical between the two, suggesting that the two proteins are very similar. I've included the BLAST results of my search below using O104:H4 EprK (Query, top strand) and it's alignment with O157:H7 EprK (bottom strand). So, your drug probably works on the new strain too. If you want the amino acid sequence of the O104:H4 strain, simply take the DNA sequence to ExPaSy (link) and translate it. It actually took me a bit to get the protein sequence because there is a frameshift mutation in the O104:H4 sequence read. If you scroll down to my alignment and find the part highlighted in red, you will see there is an extra adenosine (an 'A' base) in the O104 sequence. This throws off the protein translation. I assume it is a mis-read in the O104 sequence (a common mistake when the sequencing machine reads through a string of the same base) and deleted it when I translated from DNA to protein. The resulting amino acid sequence (pasted below) is very similar to EprK from other EHEC strains. I'll double check this and follow up with them.

Anyhow, I don't think there is a structure for the EprK protein, but if there was, you could use the existing structure as a model and make the amino acid changes seen in the O104:H4 strain to give you a decent starting point for the structure-based design of new drugs.

Find a pathogenic protein of interest and try this yourself... it's not too hard. When the topic of EHEC comes up at the next party, you can impress your friends by saying you blasted several virulence factors and found them to be quite similar/different from strains of previous outbreaks. I would do this myself but, oddly enough, I don't get invited to parties anymore. Anyhow, as a final disclaimer... although I have tried to be careful please verify anything I have posted before use.

Query 1 GTTGAGGATGAATATAACTAATTGGATCATATATAATCTTTCTTAGGGCAAGATTCATAA

|||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||

Sbjct 443403 GTTGAGGATGAATATAACTAATTGGAGCATATATAATCTTTCTTAGGGCAAGATTCATAA

Query 61 CGCTCTCATATGTCTACTTAATTTTCAACCTGACTAAATTAGTTAGAATGGCCCTATACT

|| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct 443343 CGTTCTCATATGTCTACTTAATTTTCAACCTGACTAAATTAGTTAGAATGGCCCTATACT 443284

Query 121 TCCATAACAGCCAGCAAGTCGCTACGGATATTAATGCAAGTAAGATAGAAACCGGCATAG

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct 443283 TCCATAACAGCCAGCAAGTCGCTACGGATATTAATGCAAGTAAGATAGAAACCGGCATAG 443224

Query 181 CCTTATCATAAGCAAAAACAGGTTCGCTAATTTCATATGTTGGTGCTTGCTCAATAATGT

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct 443223 CCTTATCATAAGCAAAAACAGGTTCGCTAATTTCATATGTTGGTGCTTGCTCAATAATGT 443164

Query 241 CTCTTCGTTTTGACAATACAACAGAAATATTTTCATATTGTACGCTTGCAGAGCTATTAA

||||||||||||||||||||||||||||||| |||||||||||||||||| | |||||||

Sbjct 443163 CTCTTCGTTTTGACAATACAACAGAAATATTCTCATATTGTACGCTTGCAAAACTATTAA 443104

Query 301 CAATAAATCTCTTGATATCATTTATTTTTATTTCTGGGTTGATATCTTTTTCATATACTG

||||||||||||| || |||||||||||||||||||| ||||||||||||||||||||||

Sbjct 443103 CAATAAATCTCTTTATGTCATTTATTTTTATTTCTGGATTGATATCTTTTTCATATACTG 443044

Query 361 CAAGTACAGAAATATGAATTGGTAAAGCAGTTTTACCACTATCGCCATTATCAACATCGT

||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||

Sbjct 443043 CAAGTACAGAAATATGAATTGGTAAAGCAGTTTTACCACTATCGCCAGTATCAACATCGT 442984

Query 421 AACTAACATGTACTCTCGAAGAAATAATGCCATCCATAATTTTGAGAGATTGCTCTAACC

|||||||||||||||||||||||| ||| |||||||||||||||||||||||||||||||

Sbjct 442983 AACTAACATGTACTCTCGAAGAAACAATACCATCCATAATTTTGAGAGATTGCTCTAACC 442924

Query 481 GCTGCTCAATAGCAGAATATAGCCTTGCTTTTTCCGCTCGTGGAGATGAAAACGAGTGCA

||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||

Sbjct 442923 GCTGCTCAATAGCAGAATATAGCCTTGCTTTTTCCGCTCGTGGAGATGAAA-CGAGTGCA 442865

Query 541 TCTGCAGGGAACATCTGCGATATTTGAATATCAGGCTTACCCGGTAGATTGTAGATTTTT

|||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||

Sbjct 442864 TCTGCAGGGAACATCTGCGATATTTGAATATCAGGCTTACCCGGGAGATTGTAGATTTTT 442805

Query 601 AGCCAATCCACCGCAGAAGCAAAATCCGTTGGTTCGACAAATATTGAAAATCCTGTTTTG

||||||||||||||||||||||||||||||||||| ||| | ||||| ||||| ||||||

Sbjct 442804 AGCCAATCCACCGCAGAAGCAAAATCCGTTGGTTCAACATAGATTGAGAATCCAGTTTTG 442745

Query 661 CCTTGATCCTTCTTTTCAGCATTAATATTATGTCTTTGTAAAACAGCAAGGACATCATTA

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct 442744 CCTTGATCCTTCTTTTCAGCATTAATATTATGTCTTTGTAAAACAGCAAGGACATCATTA 442685

Query 721 GCTTGCTGTTGATCAAGATGGTTCAATAATTCCTGCTGCTTGCAGCCGCACAACAGCAGG

||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||

Sbjct 442684 GCTTGCTGTTGATCAAGATGGTTCAGTAATTCCTGCTGCTTGCAGCCGCACAACAGCAGG 442625

Query 781 ATAAACAATAATA 793

|||||||||||||

Sbjct 442624 ATAAACAATAATA 442612

Predicted amino acid sequence for O104:H4 EprK protein, (corrected for gap):

L L F I L L L C G C K Q Q E L L N H L D Q Q Q A N D V L A V L Q R H N I N A E K K D Q G K T G F S I F V E P T D F A S A V D W L K I Y N L P G K P D I Q I S Q M F P A D A L V S S P R A E K A R L Y S A I E Q R L E Q S L K I M D G I I S S R V H V S Y D V D N G D S G K T A L P I H I S V L A V Y E K D I N P E I K I N D I K R F I V N S S A S V Q Y E N I S V V L S K R R D I I E Q A P T Y E I S E P V F A Y D K A M P V S I L L A L I S V A T C W L L W K Y R A I L T N L V R L K I K

Tuesday, June 14, 2011

Amazing new paper on Zombie Ants

As some of you may know, I’m a big fan of zombies. Not the feet-dragging, flesh-rotting stereotypes found on B-grade horror movies (although they can be cool too) but the ones found in real life. The ones that make you wonder whether human zombies are for real. The science behind these phenomena is fascinating but absolutely terrifying. Creatures that suddenly exhibit irrational behavior or complete odd and highly specific tasks. (Don’t worry, your girlfriend is (probably) not a zombie.) I already wrote a bit about T. gondii (link) but a recent article in the journal BMC Ecology (abstract) describes an even more horrifying example. Zombie ants. I’m thinking this would make a great sequel to A Bug’s Life.

Zombie ant with fruiting body

It starts with a simple fungal infection and before long the ant is no longer following the well-marked ant trails through the Thai rainforest. It starts staggering and has the occasional convulsion but instead of heading to rehab, it falls out of the tree and onto the forest floor. At solar noon, the ant stops its random stagger and makes a bee-line to a nearby sapling. It clamps its mandibles into a leaf (almost always a primary vein, under the leaf, facing NNW, about 25 cm high) and dies. Bizarre? Yes, but to the fungus it is all part of a diabolical plan (cue music). In order to reproduce, the fungus (Ophiocordyceps unilateralis) requires a very specific temperature and humidity. An environment not present in the canopy (where the ants are) but uniformly at about 25 cm from the forest floor. What’s an evil fungus to do? In order to get there, the fungus hijacks the ant and manipulates its brain by releasing various chemicals and poisons as well as making specific morphological changes to the mandibles. All of these activities are designed to get the ant out of the canopy, go to a specific environment, and have the ant remain attached there after death. Then the fungus sprouts a fruiting body out of the ant’s head to release spores. All in all, the amazing transformation from ant to fruiting body takes about 2-3 weeks. Many of the details are still a mystery but the Hughes paper begins to shed some light on this process. A process, incidentally, that is very ancient. Another paper by Hughes (abstract) describes fossils from the Tertiary Period (from about 50 million years ago) that bear mandible scars on primary veins of leaves. Could these be the echoes of ancient zombie ants? Could our own legends be the echoes of human zombies? I wouldn’t worry too much unless your spouse’s ‘honey-do’ list becomes very bizarre or your girlfriend’s new hat looks suspiciously like a fruiting body.

Thursday, June 9, 2011

Rapid characterization of the EHEC outbreak by “crowdsourcing”

Now we’ll be moving from papayas and fish to something a bit more sinister: EHEC O104:H4. That is the name of the E. coli responsible for the recent German outbreak. When a new outbreak begins sickening patients, researchers all around the world are mobilized to try and characterize the pathogen. Since new strains often have similarity to well-understood strains, one of the critical first steps is to sequence parts the genome. When SARS was first flaring up in Asia, I worked for a sizable biotech company focused on developing drugs for viral diseases. Very early data suggested that the SARS virus may have a similar pathogenesis to related coronaviruses, particularly with regard to viral entry. However, we couldn’t design drugs to combat SARS until we had the DNA sequence for that part of the genome. Once that became available, I used our in-house analysis software to design the initial set of lead drugs and we were off and running.

Sequencing the entire genome is extremely time consuming, but BGI (formally known as the Beijing Genomics Institute) is utilizing ‘crowdsourcing’ to help assemble the EHEC genome faster (here’s the press release). Using open source software, Twitter feeds (@BGI_Events), and several sites for uploading data, they hope to pull together data from researchers around the world in an organized, efficient manner. Here’s the bioproject link for this work at NCBI (link). This exchange of data is great for biopunks because one can analyze the data almost in real time and there is a significant potential for finding interesting and important aspects of the EHEC strain, based on sequence similarities/differences with other strains. Mike the Mad Biologist had a blog post a couple days ago that offers a glimpse of the type of analysis people are doing (link). The more eyes there are on the data, the quicker the strain can be characterized and as I have mentioned before, the potential of using ‘citizen scientists’ or ‘crowdsourcing’ for efforts of this type are enormous. With the advent of rapidly accessible data, and the power of on-line DNA analysis tools, the gap between the scientist and everybody else has never been smaller.

Tuesday, June 7, 2011

Using DNA 'barcodes' to combat fish fraud

Snapper fillets

Now that Memorial Day has passed, we tend to do a lot more grilling here in the Dark Lab. The weather here in SoCal falls into a predictable perfection and any given evening is perfect for throwing something on the grill. So, I head out to my local grocery store and look for a nice fish… snapper maybe? Looking at the package, it’s definitely a fish but is it really snapper? I can’t tell. In fact, studies show that up to 70% of fish sold as snapper is actually something else. The FDA tries to monitor fish but they are probably more focused on safety rather than accuracy. However, there has been a lot of press lately about mislabeling of fish. Last week, the New York Times ran an article (link) with some shocking statistics about how frequently fish are mislabeled. According to a report by the non-profit group Oceana (2.3 Mb pdf here), for every three packs of fish you buy, one of them will be wrong.

Oceana references a number of scientific studies, including a paper by Wong and Hanner (abstract), who use a PCR-based approach to analyze the DNA sequences of fish in the marketplace. They found that some substitutions are obvious fraud. For example, fish labeled as red snapper (sold at $3 per pound) was actually redfish (that would cost 72 cents a pound). Fish labeled as white tuna sushi was actually tilapia. These are flagrant mistakes, and it is not at all clear whether this is done on purpose or is the product of the complex network of processors and middle-men that are required to bring a fish out of the sea and to your dinner plate. However, some mistakes are less apparent… for example, Atlantic halibut was labeled as Pacific halibut. No big deal, right? What if you knew that Atlantic halibut was endangered? Would you still buy it? This type of mislabeling suggests some fishermen may be catching more than their quota of threatened or endangered fish and packaging them as something else. Another recent article goes into more detail about the social and financial implications of fish fraud (abstract).

Want to know what fish you are buying? It’s a great DIYbio project. If your hackerspace has the ability to do DNA sequencing (or you can send sequencing samples via the hack shack) then checking your fish can be pretty easy. You will probably want to sequence several spots in the genome and will need sequencing primers for each (which are cheap and easy to design). Once you have the DNA sequence from your fish, you can use an online tool called Blast (link) to search the genome database for your sequence and it will tell you what species it is from. If you already know the sequence (from the primer design, for example), then you can simply align the correct sequence with your fish’s DNA and see if you get a perfect match. This method will give you a pretty good idea if you have the right fish as long as there are differences in the DNA sequence between the various species. Sometimes, they can be very similar.

If you have access to a hackerspace with a PCR machine (and the reagents!) and a way to run an agarose (DNA) gel, there are several other options. You can do an AFLP analysis, which is a very sensitive way to look for polymorphisms (changes) in DNA. A recent paper by Maldini (abstract) outlines the approach and applies it to identifying fish. They claim that even closely related fish can be identified. Another PCR-based option is to amplify a gene using a species-specific primer. In this case, you see good amplification (ie, a band on a gel) only when the DNA of that species is present. Two advantages with the PCR approach are that you don’t need much DNA and it doesn’t need to be all that pure (both are big advantages for the biohacker). One thing you will need is a set of PCR primers for the species of fish you are buying. I hope that someday these will also be readily available in any decent hackerspace, but until then, you will have to get them yourself. The Wong and Hammer paper has some primers listed and primers for key genes from the most common types of market fish are freely available on the internet. If they can’t be found directly, you can also design them from the fish’s genome. Genbank (link) has some of this information but another good source is the website for the Fish Barcode of Life (link). This great organization is trying to catalog all fish, including those we eat. Eventually, they will have links to the genome of every fish so you can use that for primer design. As an added DIYbio bonus, they are also looking for additional data from people like YOU! Not with the DNA sequencing, but with the development of range maps that indicate where the different species of fish are found. This is a great opportunity for all you fishermen out there (go here to see how you can report a sighting). It’s also a way for biopunks to make important contributions to this effort while doing a little home-based food surveillance.

So, did you notice? The snapper picture is mislabeled... it's actually tilapia. At least you didn't pay 10 bucks for this blog post.

Wednesday, June 1, 2011

Biopunks in USA Today

There’s a pretty nice blurb on the DIYbio movement in today’s USA Today. It covers the basics and highlights a few of the controversies but doesn’t describe many practical applications. It touches on DNA sequencing and open access to lab equipment and basic molecular biology tools (expression vectors and strains, common reagents, etc) and hints at one cool application (the blue yogurt). Nice plug for BioCurious (a hackerspace in the Bay area) and the OpenPCR machine. They even talk about the risk of making “unstoppable Franken-microbes”. Not that we would ever do such a thing in the Dark Lab, but I do know of three fictional teen biopunks who had a basement experiment go horribly wrong…

You can read the USA Today article here but you’ll have to wait awhile to read what happened to the teens.

The Dark Laboratory

Pages

ShareThis

Labels

About Me

Saturday, June 25, 2011

Preliminary functional annotation of O104:H4 genes/proteins by Era7 Bioinformatics

Monday, June 20, 2011

Using O104:H4 EHEC data... an example

Tuesday, June 14, 2011

Amazing new paper on Zombie Ants

Thursday, June 9, 2011

Rapid characterization of the EHEC outbreak by “crowdsourcing”

Tuesday, June 7, 2011

Using DNA 'barcodes' to combat fish fraud

Wednesday, June 1, 2011

Biopunks in USA Today

Followers

Blog Archive

DIYBio Links

Zombie Links

Blogger