Protein aggregates such as Amyloid β plaques and tau tangles are thought to be causative of Alzheimer’s disease, the most common form of dementia. Each of these two protein can serve as a biomarker for the disease, meaning that detection of the protein can help to qualitatively (Do I have Alzheimer’s?) and quantitatively (How advanced is the disease already?) characterize patients. First of all, biomarkers are important to facilitate therapy. The idea is: The earlier a disease is diagnosed, the more damage can be prevented. Unfortunately, in the case of Alzheimer’s disease currently no drugs are available to cure the disease. However, several options are available for patients and their families to decrease the disease burden either pharmacologically, by psychotherapy or by adapting the living environment into a form that prolongs autonomy and decreases stress. Here biomarkers can significantly decrease the disease burden since an early adaption to the disease avoids distress, wrong diagnosis and might even add several years that are experienced as ‘positive’. Secondly, biomarkers are important for the development of effective anti-Alzheimer’s drugs. Patients for clinical trials need to be identified early, so that the effects of drug candidates on disease progression can be judged appropriately. Both Alzheimer’s-linked proteins Amyloid β and tau can be detected in patients as fluid and imaging biomarkers. Unfortunately, detecting both proteins in the cerebral fluid is not risk-free and a complicated procedure. Also imaging is not straightforward and can not be applied frequently in large patient groups.

Because of the high medical need, but current obstacles for efficient Alzheimer’s disease biomarkers, a recent publication by Oliver Preische, Matthias Jucker and many more scientists received a lot of attention. In this publication, they describe a protein that can be used as a blood-based biomarker for Alzheimer’s disease. Blood can of course be easily, safely and repeatedly drawn from large groups of patients. But what did the study find exactly? And how precise and accurate would this test be?

In this blog post, I will briefly describe some of the key methodologies, the principle of the test, some of the implications, and potential weaknesses. In case you are interested in the full detail of the original paper, please check here.

First of all, I should mention that the biochemical basis for the test itself is not completely new. Similar tests, using the same
neurofilament light (NfL) protein as biomarker, have been developed before, for example for Huntington’s disease or Parkinson’s disease. NfL is a protein which is found in neurons. Once these neurons become damaged, NfL ‘leaks out’ and can be identified in the blood of patients by an anti-body based test. Because NfL principally only indicates neuron damage, but is not specific for one type of disease, careful controls are necessary when using this tests in patients. In the present study, the researchers have used this existing test, but quite cleverly so and above all, they chose a good study group that allowed a good validation of the test.

So how did the researchers apply and validate the already known NfL-biomarker test in Alzheimer’s disease?

Patients with known Alzheimer’s disease mutations (but still without the manifested disease) were compared to healthy members of their own family. The figure below shows how this data looks like: Each red dot represents a human with an Alzheimer’s-causing mutation and each blue dot represents a control person without this mutation. In people without Alzheimer’s mutation you never know when and if the disease ever breaks out. That’s why its is very hard to use such people to test a new detection method. In the case of a well-studied mutation, it is quite clear when the disease will break out, and therefore, in the first step, the years up to the first onset of symptoms (= estimated years to symptom onset (EYO)) could be calculated.

The researchers determined the estimated years to symptom onset (EYO) for people carrying known genetic Alzheimer’s mutations (red dots) and compared this data to members of their family without the mutation (blue dots). Next they measured the amount of NfL protein in the blood of the two groups. The data shows that the closer a person approaches the measurable outbreak of the disease, the higher the NfL levels become. Importantly, their is a difference between both groups. People with Alzheimer’s mutation seem to have higher NfL values, especially in the early phase, when the disease has just started. Therefore, NfL could be potentially used a biomarker in Alzheimer’s disease.

In the second step, the actual biomarker test was carried out. The amount of the biomarker protein NfL was displayed as a function of the years until the onset of the disease, even though people are not yet sick (check the figure above).

There are however some open questions to this approach:

The theoretical assumptions concerning the duration until disease outbreak might not be very accurate in practice and overall epidemiological data might not be accurate for a particular person in the data cloud.
It is also important that there are significant differences between the groups only from -2 EYO onwards (strong increase of the red compared to the blue curve). This means: The biomarker levels become really different
only 2 years before the onset of the disease. Whether two years are actually enough to treat the disease well or prevent it, is currently unclear. This test is therefore not suited for the long-term prediction of disease outbreak. NfL levels are simply not high enough very early on.

What is certain, however, is that this test will be very important for clinical trials of potential drugs. Surprisingly, one of the major problems with drug trials is that there is often no real evidence to suggest that you can prevent a disease. How would you ever know if you have prevented a disease although it might not break out for a particular person anyway?

This is where this novel NfL Alzheimer’s test comes into play. Now that we know that it works in principle, it can be applied to the majority of studies with patients who do not carry a clear Alzheimer’s mutation or in which the disease breaks out later in life. Patients who test positive because they have another neurological disorder that also releases NfL proteins because nerve cells are dying (for example, Parkinson’s disease or multiple sclerosis) need to be adjusted for afterwards. Sadly, at the moment, there is no other option.

So in essence, the test described here is relatively sensitive (shortly before Alzheimer’s breaks out), but not very specific. A good test should be both. However, at the moment this is technically not feasible because we do not know any specific biomarkers yet. Despite this and by using a clever study design, the researchers have now at least identified NfL as a sensitive biomarker that might help to find a long thought-after curing or anti-progressive Alzheimer’s drug.

During my PhD we used multi-color live cell and fixed cell single molecule imaging to light up the unknown fate of mRNA molecules during the stress response, which spans from the bright transcription site into the dark corners of the cytosol where mRNAs can encounter P-bodies and Stress Granules.

You can learn more about this exciting work by reading the most recent publication of the Jeffrey Chao lab:

The following small summary on a protein complex called EJC was inspired by a lecture given by Hervé Le Hir (ENS, Paris) at the Friedrich Miescher Institute in Basel in November 2014:

After transcription an mRNA becomes processed, exported, stored or transported, translated and degraded. Several multimeric protein complexes carry out these tasks and readily transform the initially naked mRNA into a large messenger ribonucleoprotein (mRNP) complex. For a long time it was believed that the described functional steps are occurring sequentially and relatively independent of each other. However, more recently it became clearer that many events during the life of an mRNA leave permanent protein marks which can influence the efficiency or occurrence of subsequent functional events and which are dependent on the sequence context. One of the first mRNPs that forms in the nucleus after transcription is the splicing machinery. It splices introns out of the pre-mRNA molecule, thereby creating the mature mRNA. The splicing reaction, however, leaves a relatively stable mark on the newly created spliced mRNA: The Exon Junction Complex (EJC).

What is an EJC and where is it formed?

The EJC is a multiprotein complex that forms as a consequence of splicing upstream of exon-exon junctions. Although the EJC’s composition is dynamic it contains four core proteins: The RNA helicase and eukaryotic initiation factor 4A3 (eIF4A3), metastatic lymph node 51 (MLN51), and the heterodimer Magoh/Y14. eIF4A3 possesses two ATP-dependent RecA domains which bind RNA in a “clamp-like” fashion. Magoh/Y14 seems to prevent conformational changes of eIF4A3, while the conserved SELOR domain of MLN51 also binds to the RNA and in addition stabilizes the RecA clamps further (1). This tetrameric core now serves as a platform allowing for the binding of other factors that catalyze different regulative processes during export, transport and translation of the mRNP. By using both fluorescence and electron microscopy approaches it became possible to narrow down the assembly zone of the tetrameric EJC core to nuclear punctuate regions termed perispeckles (periphery of nuclear speckles). All EJC subunits are enriched and fully assembled in these structures while MLN51, Magoh, and Y14 mutants fail to localize to the perispeckle region. Furthermore, perispeckles seem to contain polyA mRNAs and transcripts which are actively undergoing splicing (2). These nuclear compartments had earlier already been described as storage and assembly cites for splicing factors which highlights the possibility that EJC proteins join in a co- and post-splicing manner.

Which processes does the EJC catalyze?
Splicing of certain long intron containing mRNAs is affected by EJCs and the complex also seems to be responsible for the catalysis of one form of alternative splicing. Furthermore, the EJC is implicated in mRNA transport and plays an important role during nonsense-mediated decay (NMD) of transcript possessing a premature stop-codon. When such an erroneous codon is present, some EJCs remain bound to the mRNA because they are not displaced by the progressing ribosome and become bound by the up-frameshift factors Upf1, Upf2, and Upf3. Together these proteins trigger mRNA decay (3). For a long time it has been known that the presence of introns enhances the translation of a construct when compared to a similar construct that is lacking introns. Another
important task of EJCs therefore seems to be the enhancement of translational efficiency of spliced mRNAs. This has mainly been demonstrated by tethering all four EJC components artificially to mRNAs in Xenopus oocytes (4). The molecular details of this process have, however, remained elusive until recently.

How does the EJC influence translation?
It has been described that the EJC is the functional linker between splicing and an enhanced translation efficiency. Recently it emerged that the EJC component MLN51 might mediate this relationship by interacting with the translation initiation factor eIF3 (5). First of all it was observed that overexpression of MLN51 enhances translation of spliced luciferase reporters versus identical non-spliced reporters. Furthermore, MLN51 also enhances translation if the remaining three EJC components are not present. Immunoprecipitations then showed that several translation initiation factors and ribosomal subunits can bind EJC components, but only MLN51 binds via its SELOR domain to the initiation factor eIF3. This interaction might lead to a stabilization of the mRNP complex so that translation can initiate successfully. One problem, however, persists: Several studies have described that the ribosome displaces the EJC from the mRNP complex during the first round of translation. The question whether an upregulation of the first round of translation is sufficient to explain the observed positive effect on translation efficiency by the EJC is therefore still open. One explanation could be that EJCs increase the absolute pool of translated mRNAs via MLN51. Alternatively, MLN51 might increase the total number of initiating ribosomes on the single mRNA before the EJCs become displaced. It might also be possible that MLN51 survives on the mRNA after displacement, and thereby is able to initiate subsequent rounds of translation. This hypothesis seems probable since the other three EJC components are not required for an increased translation efficiency. Since a large number of factors have been described that peripherally bind EJCs (1) the molecular mechanism of translation enhancement is likely be more complex and more functional interactions of MLN51 need to be identified. The past years of research have, however, shown that the sequence context and all lifecycle steps of an mRNA are closely linked and the EJC serves as an interesting example for the complexity of an mRNAs life.

1. Le Hir H, Andersen GR. Structural insights into the exon junction complex. Curr Opin Struct Biol. 2008 Feb;18(1):112–9.
2. Daguenet E, Baguet A, Degot S, Schmidt U, Alpy F, Wendling C, et al. Perispeckles are major assembly sites for the exon junction core complex. Mol Biol Cell. 2012 May 1;23(9):1765–82.
3. Gehring NH, Kunz JB, Neu-Yilik G, Breit S, Viegas MH, Hentze MW, et al. Exon-junction complex components specify distinct routes of nonsense-mediated mRNA decay with differential cofactor requirements. Mol Cell. 2005 Oct 7;20(1):65–75.
4. Wiegand HL, Lu S, Cullen BR. Exon junction complexes mediate the enhancing effect of splicing on mRNA expression. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11327–32.
5. Chazal P-E, Daguenet E, Wendling C, Ulryck N, Tomasetto C, Sargueil B, et al. EJC core component MLN51 interacts with eIF3 and activates translation. Proc Natl Acad Sci. 2013 Apr 9;110(15):5903–8.

Remember to forget

April 1, 2014

Yes, forgetting is essential! In order not to overload your brain with “useless” information from the past you need to be able to forget. But how does forgetting work? Synapses connect neurons in the brain and it is thought that an altered neuronal structure (read: different wiring or less wiring) leads to forgetting. While a lot of time, money and careers are invested into the question how synaptic networks are formed, it is not very clear how the complexity can actually decrease. Assuming that a reduced synaptic “landscape” is equal to the well-known process of forgetting, it is therefore not very much known about this process. Although not te first of its kind, a recent paper addresses this issue and proposes a molecular mechanism which is mainly based on the regulation of the actin cytoskeleton via a post-transcriptional mechanism. And the evidence seems strong! The model organism used here are the C. elegans worms that can actually be trained to avoid a certain taste because they were starved of food when they were in contact with it for the first time. Remembering and forgetting this Pavlovian training by the worms can then be used as a proxy for memory function. As already mentioned, the major player in the competition between memory formation and forgetting is the rate at which synapses are formed and degraded. An already previously described and neuronal active protein called MSI-1 is proposed here to be responsible for the degradation part by inhibiting the translation of at least three mRNA types (arx-1, 2 and 3) who´s protein products would normally from the Arp2/3 complex. This complex is normally responsible for remodeling the actin skeleton of the synapses by the induction of actin branching. MSI-1 therefore prevents the Arp2/3 complex formation and thereby leads to decreased synaptical structure retention. In other words: MSI-1 increases the tendency for synapses to disappear, which might be one factor to answer the question why we forget things. This interplay is further strengthened by the authors finding that the deletion of the add-1 gene (responsible for actin capping and therefore stabilization) leads to memory loss. However, this phenotype could be reversed when msi-1 is deleted at the same time. As a consequence, add-1 and msi-1 must both be involved in memory formation and retention, but with opposing functions.

An unresolved question, however, remains how MSI-1 is “activated” to suppress arx mRNA translation. It is likely that forgetting is a neuronally regulated and controlled process, just like memory formation. The authors propose that the glutamate receptor GLR-1  might play a role in this process because it´s expression is exclusively increased in the MSI-1 positive neurons during learning. On the contrary GLR-1 is also required for MSI-1 function and therefore memory loss. How the upstream regulator GLR-1 can influence these two opposing events at the same time therefore remains an open question for future studies. Another interesting and open question is the link between the AVA neurons in which MSI-1 was predominantly found and neurons in the gut of the worms in which MSI-1 was also found. Can this link be explained by the food/starvation related setup of the experiment? And do other forms of training/memory acquisition and the resulting forgetting mechanism work differently? Furthermore, what are the effects of MSI-1 on the other numerous actin remodeling factors?

Despite these open questions, the paper presents compelling evidence for an additional molecular mechanism explaining neuronal information retention and loss. In summary and interestingly, memories seem to be regulated in a balanced way that is deeply influenced by the synaptical actin skeleton which is actively constructed, and passively degraded by the inhibition of its formation through the translation repressor MSI-1.

Time to focus

September 27, 2013

Life from single molecules to entire populations takes place in four dimensions. Three of which are spatial dimensions and last, but not least, the dimension of time. Interestingly, researchers ignored these hard realities for quite some time. During my PhD project on translational regulation within cells, we would like to master the four dimensions as good as we can. Live-cell imaging is a good method to monitor a single cell over time and to observe what is changing. However, live-cell imaging requires sharp and crisp images in order to be able to track single molecules over longer time spans. The biggest problem with conventional light microscopes are in fact the three spatial dimensions (x, y, z), because all the light from the specimen that you are observing is collected. This means not only the light of a single plane (x,y dimensions) is collected (and later observed), but also the light originating from all other planes above or below (z dimension) (see also Figure 1). Collecting a lot of this so-called “out of focus” light leads to blurred pictures, which means that fine details cannot be distinguished from each other anymore.  A powerful tool to circumvent this problem is a variation of classical light microscopy called CONFOCAL MICROSCOPY. Here, I would like to give a short introduction into this extremely powerful and widely used microscope technique.


Figure 1: A cell that is observed under a microscope has three dimensions (x, y, z). However, the optics of a microscope dictate that only one z-plane can be “in focus” and not all planes at the same time. A standard microscope collects the light of all planes and therefore often produces blurred images when larger objects such as cells are observed.

In order to make sense of the confocal technique (con-focal = “having the same focus”) I would like to draw your attention to Figure 2. With the help of the steps 1 to 5 I will guide you through the figure. First of all a confocal microscope needs a strong light source. This role is often fulfilled by a short-wavelength laser (Step 1). The laser light is then reflected in a 45° angle by a so-called dichroic mirror (Step 2). This special mirror reflects short wavelengths (such as the green excitation laser), but is permissive for longer wavelengths (such as the emitted red light). The reflected green laser light is focused by the objective lens onto the specimen. Unfortunately, it is impossible to focus the light on only one single z-plane. As a consequence, a number of z-planes are excited by the green light and depending on the fluorescent molecule emit longer wavelength light, here depicted as red, orange, and purple (Step 3). Partly, this emitted light will later form the image that you can observe, but first it needs to travel to your eye: As explained above, the dichroic is permissive for the emitted longer wavelength light. Therefore, the light originating from all z-planes can pass. Since the light is originating from different planes, it also hits the so-called focal lens at different positions resulting in different focal points of this light. And now a small slit, called pinhole comes into play (Step 4): Most light (based on its origin) cannot pass this tiny opening because it is either focused in front of or behind the pinhole. The reason why a confocal microscope produces crisp images is that only light from a single z-plane is able to pass since its focal point is exactly within the pinhole (in this example the red light). Consequently, this light can reach the detector (Step 5) where it is converted into a visible image.


Figure 2: The setup of a confocal microscope can be described in five simple steps (see text). The pinhole is the central element because it blocks all “out of focus” light originating from non-desired z-planes.

Unfortunately, the seemingly simplistic confocal approach also has two important side effects. First of all, a lot of light is lost because it is shielded by the pinhole. This in turn requires a very strong light source which can damage the sample if applied for a long time. In order to prevent this from happening, the specimen is scanned point-for-point in the x,y dimension. This leads us to the second side effect: Scanning takes a lot of time and this is kind of impractical if you want to observe a live cell. But: Both problems can be (partly) resolved by a variation of confocal microscopy called “spinning-disc confocal microscopy”.

More on this technique in my next post!

Traditionally single-molecule experiments are performed in vitro and therefore in a reduced environment. Recently, it has become possible to combine this single-molecular accuracy with a living single cell and to observe what happens in real time (“live”). For biologists the combination of these three technological ideas creates a lot of possibilities to answer a number of currently unanswered questions. I am very happy to be able to be part of this adventure. In the following I would like to address some aspects of my work:

What am I doing?

Currently I am working on the intriguing and big question how cells translate their DNA into protein. Interestingly, many important sub-questions of this problem still remain unanswered, especially when focusing on the fate of mRNA molecules once they have left the nucleus and are present in the cytoplasm. The quantification of the translation process in time and space, characterization of its steps and major molecular players is our focus area. In order to elucidate what happens to mRNAs in the cellular context we mark them with fluorescent proteins and apply single- and live-cell imaging. In addition, new labeling and detecting technologies allow to study mRNAs at the single-molecular level.

Why study translation live and in single cells?

The so-called central dogma of biology, namely the conversion of information stored in the DNA into proteins, has been dissected by a large number of scientists. However, in most traditional approaches the mRNAs as the central information carriers are isolated from large numbers of cells and therefore removed from their natural cellular context. This results in functional deficits and loss of spatio-temporal information (“Why is this mRNA at this place in this cell at this time?”). In contrast, the combination of single- and live-cell imaging allows to study the fate of mRNAs during translation in their physiologic environment, over a longer period of time and with a minimum of disturbing factors. The use of only single cells also allows to detect differences between cells of the same kind (for example neurons or muscle cells). An organ represents a very heterogeneous environment, so cells have to be different in order to be able to adapt to their local environment. Even 150 years ago Charles Darwin already noted that observable traits can vary widely within a species. Why couldn’t this also be the case for individual cells?

Why single-molecular accuracy?

Next to the advantages that live single-cell analysis has to offer, it is important to keep in mind that most biological processes can be reduced to the level of molecules. When, however, a larger number of molecules is observed (even within a single cell) this automatically leads to an averaging effect. A complicated biological process, like the mRNA translation into protein, involving a number of molecules during specific stages might therefore only be recognized as single event with a “before” and “after” without knowing what really happened in between. By visualizing single molecules it becomes possible to track their role as a puzzle piece within the big picture.

Nice. And how is this done?

There are two major tools. The first one is a microscope (more specific: a light microscope called confocal spinning disc microscope) to observe the single cell with its mRNA molecules. However, the resolution of a light microscope is limited to about 220 nm (1 nm = 1 m / 1,000,000,000). Even though a RNA molecule might be longer, it is also about 1,000 times thinner and therefore not detectable. In order to be able to still detect them we label them with fluorescent proteins. The emitted light results in a so-called “diffraction limited spot” which can be detected by the cameras of our microscope. For the RNA labeling we apply the MS2 and PP7 systems which use specific bacteriophage proteins that are again fused to fluorescent proteins and can bind to specific regions within the mRNA molecule of interest. Importantly, the MS2/PP7 labeling does not harm the biological processes within the observed cell. With this system it is also possible to label a single mRNA molecule in two colors (for example red and green). During the mRNA translation process different parts of the mRNA are targeted by the translation machinery in a sequential manner which has an influence on the binding of the green and red proteins. The appearance of both colors at the same time (yellow), first green and then red, or the other way around, the speed at which this change occurs, and the location within the cell can tell us a lot about the translation process.

In case I could spark your interest for single-molecule live cell imaging please also see our website or check out the following three articles on mRNA labeling and detection:

  1. Hocine et al., Single-molecule analysis of gene expression using two-color RNA labeling in live yeast. Nat Methods. 2013 Feb;10(2):119-21.
  2. Wu et al., Fluorescence fluctuation spectroscopy enables quantitative imaging of single mRNAs in living cells. Biophys J. 2012 Jun 20;102(12):2936-44.
  3. Larson et al., Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science. 2011 Apr 22;332(6028):475-8.

More on

  • the Spinning-disc microscope
  • the MS2 and PP7 labeling systems
  • and “diffraction limited spots”

will follow later.

Since form follows function, the visualization of protein structures is vital for understanding biological complexity. Several ways of producing images that are not just beautiful, but also address certain research questions and help to elucidate protein function exist. Here I briefly want to talk about the program PyMOL which was originally created by Warren Lyford DeLano in 2000. My current project deals with the single-molecule characterization of bacterial translesion DNA polymerases. The polymerase I am working with the most is Pol II and therefore it is very interesting for me to picture this molecule in a way that allows me to understand how Pol II interacts with its DNA substrate (polymerases replicate DNA). However, translesion polymerases such as Pol II are not only capable of binding regular DNA, but also damaged DNA in order to prevent a stalling of the entire replication process which would otherwise lead to cell death. But why is Pol II DNA damage tolerant?

Now we are at the point were structural protein information is required. In this case this information was created by Wang and Yang in a very elegant and interesting crystallization study of Pol II (1). Even though the authors do a great job to visualize their findings, it might be helpful to do this yourself in order to create for example a different perspective view or even a short video that shows the protein from different angles. In addition you might also want to highlight certain amino acid residues by a specific color and thereby state the importance of certain functional proteins domains. All this can elegantly be performed by PyMOL. For my purposes I created the following view of Pol II containing a DNA helix with a tetrahydrofuran (THF) lesion which can not be processed by a regular polymerase.


For protein structure visualization the Protein Data Bank is the place to go. Just search for the protein structure you are interested in (hopefully it exists) and download the so-called PDB file which contains all the 3D data that is necessary to visualize the protein. Here I used the PDB file 3K5M which contains information on Pol II bound to a THF lesion DNA.

I assume you have downloaded PyMOL by now and know how to load a PDB file into it. What I like about PyMOL is its command line which allows to rapidly change what you want to see and achieve with your protein. The downside is that you need to know the syntax of the commands and also need to memorize the important commands because looking them up all the time is time consuming. The PyMOL user guide on pages 17 to 37 is a great introduction to the most essential commands. However, I used the following command sequence to produce the picture above. This is the foundation, but it can bring you quite far.

Step 1: Know your protein domains. Pol II has five domains in total. I gave a distinctive color to each one of them. The N-terminal domain stretches form residues 1-146 and 366-388, the Exo domain from 147-365, and so on… From the literature you must figure out yourself which domains your protein has and where they are located. The following command is very important and lets you select your domains (here the example for the N-terminal domain):

PyMOL> select nterminal, resi 1-146,366-388

This command will name the indicated residues “nterminal”. Later you can use “nterminal” to address the entire domain instantaneously. Specify the residues of all your domains and give them a descriptive name. On the right side of the screen you can then see an overview of all your domains.

Step 2: Know how to hide. Proteins can be very confusing. Use the following commands to first hide everything and then only selectively display what you want to see in a style that you like.

PyMOL> hide all

PyMOL> show cartoon, nterminal

I personally like the alpha-helix and beta-sheet displaying style called “cartoon”. But please also play around with the following styles:  ellipsoids, lines, ribbon, dashes, mesh, volume, sticks, and many more… In case you become confused or made a mistake just use “hide all” or “hide cartoon, nterminal” to get rid of your confusion. Now choose a fitting style for all your domains and do not worry about the colors yet.

Step 3: Colors are nice. Now it is time to further organize your residues not only by displaying style, but also by color. This command is very easy and works with all major colors like green, yellow, purple, blue, red, orange and so on…

PyMOL> color yellow, nterminal

So go ahead and color each of your domains in a clear, distinctive way.

Step 4: Size does matter. The “cartoon” view is very handy to see in which contexts residues are located without the confusion of all the side chains. However, a real protein is much bigger and the electrostatic forces and hydrophobic interactions determine which part of a protein is actually accessible by ligands or in the case of Pol II by DNA. The accessibility can be modeled in PyMOL by an algorithm displaying the surface that is available to water molecules.

PyMOL> show surface, nterminal

PyMOL> set transparency, 0.6

Use these commands for each of your domains and play around with the transparency (from 0 to 1) so that the surface depiction does not become overwhelming and the cartoon residue structure is still visible. The combination of two or more different styles at the same time such as “cartoon” and “surface” in this case is actually one of the strongest features of PyMOL!

Step 5: Make it nice. Most people’s aim is to create protein structure visualizations for presentation or publications. How to achieve a high-quality and clean file is therefore very important.

PyMOL> bg_color white

PyMOL> ray

These commands set the background color to white which is most convenient for most applications. “Ray” creates a sharper (read: nicer) depiction of your protein. It is important to bear in mind that the “ray” command effects are lost once you perform other modifications to your protein. So only use “ray” at the finish line. Or just use it as many times as you want.

Step 6: Hold on to good things. In order to save your work go to the “File” menu located on the top of one of the PyMOL windows. There are a couple of options how to save your work. Most important is “Save Session as…” because it allows you to go back to your current state of the project. But you are probably also interested in just saving the current view of your protein (“Save Image as…”). If you want another view angle just turn your protein as desired and save again. A PNG image will be created that can be used in papers or presentations.

Step 7: Let’s move it. You can also make a video out of individual PNG pictures that show your protein form different angles. For Pol II the result looks like this. Luckily you do not have to turn your protein 60 times and save everything  and than put it together as a movie. PyMOL does it for you.

PyMOL> mset 1 x60

PyMOL> util.mrock 1,60,180

PyMOL> mplay

PyMOL> mstop

Use these commands to make and test-view a movie of 60 frames that lets your protein structure rotate in a 180 degree space. If it gets boring after a while use “mstop” to stop. In order to save, go to the “File” menu again and choose “Save Movie as…”. Every major media player should now be able to display your moving protein.

By now you probably still have a number of unanswered questions. But there is relief. People who know much more about PyMOL have created a very convenient FAQ page which contains the answers to most questions that beginners have or that are just good to know.

And now go ahead and use structural biology research for your own purposes with PyMOL!

(1) Wang F., Yang W., Structural Insight into Translesion Synthesis by DNA Pol II, Cell 139, 1279-1289, 2009.

This little article attempts to introduce the research that professor Charlotte Hemelrijk and her group “Behavioural Ecology and Self-organization” at the University of Groningen perform on understanding the complexity of bird flocks and fish schools. In the context of the Honours College course “Leadership or Not in Animal Societies” the questions was addressed whether a single leader is necessary to organize the complex behaviour that one can observe in large and moving fish schools or bird flocks.

Is a leader required to guide the instant movement of thousands of birds or fish into one direction in order to find food or escape an enemy?

Or is some kind of intrinsic property that emerges from the school or flock sufficient to explain the observed behaviour?

In the following I will describe how the estimation of movements parameters, computer simulations, and the careful observation of fish schools in nature led to a robust model that can explain why no leader is necessary to coordinate the movement of fish schools.

For fish it is very attractive to organize in schools because spawning an area, finding food, access to mates, protection from predation and hydrodynamic effects all become more optimal for the individual fish. In order to understand how complex school behaviour evolves it first became necessary to be able to describe the formation of a school in general. It has been hypothesized that collective movements, as they occur in a school, are characterized by a directional and temporal coordination. This coordination might only become possible if individuals mutually influence each other by the distance towards other members of the school (Huth and Wissel, 1992). Fig. 1 shows which effects the distance of one to another fish has on its movement according to the formulated hypothesis. These parameters were then used in a computer simulation in order to test whether the resulting model schooling behaviour resembled the natural schooling behaviour. Interestingly, the parameters seemed to be sufficient to describe the natural behaviour (Fig. 1 (C)) that has been noted earlier (Partridge, 1981).

Fig.1Fig. 1: Hypothesized effects of the presence of a second fish on the movement of the first fish depending of the distance to each other. (A) and (B) display how a first fish reacts when a second fish is present in four different proximity zones (based on Huth and Wissel, 1992). (C) shows how the modelled fish schooling behaviour (bottom) clearly resembles the observed behaviour in reality (top) (based on Partridge, 1981).     

Based on the above described parameters attraction, alignment, and avoidance further studies were able to significantly link the behaviour of individual fish to distinct school shapes. However the researchers needed to introduce the factor “speed” into their model in order to being able to reproduce observations in nature (Kunz and Hemelrijk, 2003 and Hemelrijk and Hildenbrandt, 2008). The researchers found that through coordination and collision avoidance a transition to an oblong shaped (with respect to movement direction) school occurred as a function of speed. In other words, a fish school reduced its width and increases its lengths at increasing velocities because this enables the individual fish to avoid collisions. Later, Hemelrijk and colleagues were able to prove that the conclusion drawn from their model also holds true for fish schools in real-life experiments (Hemelrijk et al., 2010).

In order to narrow the gap to the experimental observations, the researchers also introduced a factor to describe the effect that the number of neighbours have on the movement decisions of the fish. They hypothesized the existence of two mechanisms that could govern the movement of individual fish when surrounded by more than one neighbouring fish. Fig. 2 schematically depicts both hypothesis and their respective outcomes in computer model when applied over a number of “decision cycles”. The first model assumes that the movement of an individual fish (3.) who has at least two neighbours (1. and 2.) in his field of view largely is the result of taking the average path between both fish. The second model was assumed to be more realistic because the fish (3.) would have a priority direction that largely depends on factors such as distance to his neighbour. A computer simulation of both models, however, resulted in a surprising outcome. After a number of cycles the priority model had led to a disturbed school pattern that is never observed in nature. On the contrary, the average direction model resulted in an accurate reproduction of field observations. The researchers therefore assumed that the priority direction effect is probably averaged out in a large school because there are many and changing neighbours. The final result is an average directional movement.

 Fig.2Fig. 2: Two models and their simulation results that take neighbouring fish into account during the movement decision process of an individual fish. (A) The average direction model assumes that the movement of fish 3. is the result of averaging between the direction of its neighbours. The priority direction model assumes that fish 3. decides to follow the closest neighbour. (B) Simulations of both models resulted in a dispersed fish distribution in the case of the priority model (right) and a more realistic ordered fish distribution for the average model (left).

The findings presented in Fig. 2 and the fundamental work presented in Fig. 1 therefore prove that individual fish can lead to an emergent property, such as the coordinated behaviour of a large group, without requiring a leader. It is important to note that the individual perceptions of the fish within and on the edges of the school are vital for the coordination. This means that the final direction of the fish school is probably and to a large extend based on the “decisions” that fish on the edges of the school make. These movements “decisions” might be based on knowledge and experience, but also on motivational factors such as hunger. Whether these factors can drive the behaviour of individual fish and therefore the movement of the whole school still remains to be elucidated.

The computational tools of theoretical biology therefore seem to be a good approach to describe complex behavioural patterns in animals groups. Other research projects of the Hemelrijk group used similar parameter-simulation approaches to describe the behaviour of bird flocks and their internal dynamics (Hildenbrandt et al., 2010). Also in bird flocks no real leader is necessary, but the individual movement decisions of birds in their neighbouring context seem to govern the movement of the flock as a whole. These studies can also help to improve understanding on how large groups of humans act in situations of panic and fear when rational decisions might become overruled by movement decisions that are based on the individual context.


Hemelrijk, C.K., and Hildenbrandt, H. (2008). Self-Organized Shape and Frontal Density of Fish Schools. Ethology 114, 245–254.

Hemelrijk, C.K., Hildenbrandt, H., Reinders, J., and Stamhuis, E.J. (2010). Emergence of Oblong School Shape: Models and Empirical Data of Fish. Ethology 116, 1099–1112.

Hildenbrandt, H., Carere, C., and Hemelrijk, C.K. (2010). Self-organized aerial displays of thousands of starlings: a model. Behavioral Ecology 21, 1349–1359.

Huth, A., and Wissel, C. (1992). The simulation of the movement of fish schools. Journal of Theoretical Biology 156, 365–385.

Kunz, H., and Hemelrijk, C.K. (2003). Artificial fish schools: collective effects of school size, body size, and body form. Artif. Life 9, 237–253.

Partridge, B.L. (1981). Internal dynamics and the interrelations of fish in schools. J. Comp. Physiol. 144, 313–325.

Yes, also bacteria seem to have an immune system. Bacteria frequently become attacked by phages and viruses, so they need protection too!

Here I want to briefly introduce the CRISPR/Cas system which is a very interesting type of bacterial defense system using former viral DNA sequences to guide bacterial DNA endonucleases to cellular targets where viral DNA is present. It has always been hypothesized that CRISPR/Cas could be used for biotechnological non-invasive genome editing. However, recently a number of breakthroughs concerning this application have been described. In the following I especially would like to discuss three Science journal; papers that, in my opinion, have been groundbreaking in paving the way for real future applications of CRISPR/Cas and on the other hand helped to understand the molecular basis of this system. Here I will especially concentrate on the type II CRISPR/Cas system (of in total three). In general a bacterial immune response against a viral invader can be split up into three phases: Adaption, expression, and interference. Fig. 1 schematically shows how the type II CRISPR/Cas system is currently assumed to work during the expression and interference phases.


Fig. 1: Schematic depiction of the type II CRISPR/Cas system present in bacteria to destroy invading DNA originating from viruses. The CRISPR/Cas gene cluster contains previously obtained sequence information about foreign DNA (colored triangles) which are separated by repeats (black rectangles). Upon induction this information is transcribed into pre-crRNA together with the expression of the Cas9 protein and so-called tracrRNA which serves as a universal linker to connect crRNA with Cas9. In the following steps tracrRNA and pre-crRNA are cleaved to smaller sizes at least twice. Now the crRNA-Cas9-tracrRNA complex is able to bind foreign DNA at a homologous site termed “protospacer” which is followed by a second, but very short identifier called PAM (protospacer adjacent motif). Once stable binding has been achieved Cas9 seems to cleave the invading DNA and thereby induces double-stranded breaks that inhibit the expression of viral genes. Scheme created by myself, based on (1) Supplementary Fig. 1.

Even though Fig. 1 depicts some of the molecular details that occur during a CRISPR/Cas mediated response there is probably more to the system. Especially during the adaptive phase that governs the incorporation of DNA fragments into the bacterial genome important functional key features are still unidentified. Soon after the first functional properties of the CRISPR/Cas system became evident in the late 1980s researchers began to hypothesize about the biotechnological usability of this defense system. Since then the expression and interference phases have been studied very extensively and especially during the last couple of months some exiting insights have been gained with regard to an actual application by different researchers. Emmanuelle Charpentier and coworkers described in a proof-of-principle study how a fused and custom made crRNA-tracrRNA can be applied to target sequences of interest in a DNA plasmid and in addition identified the two Cas9 protein domains that are responsible for the double-stranded target DNA cleavage (Fig. 2) (1). Partially based on Charpentier’s groundbreaking work, in a second and third paper published last month, researchers from the Massachusetts Institute of Technology and Harvard Medical School describe an exciting approach to silence entire gene loci in mouse and human cellular DNA. The key to successfully being able to target specific sequences in eukaryote cells seems to have been the co-delivery of an expression vector including both pre-crRNA sequences and the Cas9 genes (2). This approach also makes use of the chimeric crRNA-tracrRNA hybrid (Fig. 2) that mimics the naturally occurring crRNA:tracrRNA duplex described by Charpentier and colleagues (1). In a parallel study targeting rates of 4 to 25% under different conditions are described. In addition 40% of all human exons are identified by a bioinformatic approach as being potentially available for CRISPR/Cas silencing. Cloning these target sequences into a 200 base pair format for the first time allowed the creation of a library describing potential target sites in the human genome (3).

Chimeric_crRNA-tracrRNAFig. 2: Schematic depiction of the the naturally occurring crRNA:tracrRNA duplex which in conjunction with Cas9 is able to cleave viral DNA in a target specific manner (top). By creating a linker loop that fuses two functional RNA sequences it became possible to engineer a crRNA-tracrRNA chimera that, based on experimental evidence, has the same functionality as its natural counterpart and has been proven to be very effective for genome targeting purposes in eukaryotic organisms (bottom). Based on (1) Fig. 5.

I consider the CRISP/Cas system extremely interesting because of its seemingly simplistic nature consisting of only a cleavage protein, a target, and a target-identifier. Whether this is the whole story remains to be seen, but some of the most important functional elements now seem to have been identified on a molecular level. The construction and usability of an artificial linker which connects RNAs and endonuclease demonstrates how far knowledge has proceeded. Or in Richard Feynman’s famous words: “What I cannot create, I do not understand“. We might not understand it completely, but at least we can “create” an important part of it. I am confident that CRISP/Cas will play a very important role in bacterial/industrial biotechnology and also genome editing in the future because it dramatically decreases the challenges that accompany the silencing of eukaryotic genes in their native context. By further advancing knowledge and consequently the technology it might even become possible to use the system for genome editing through achieving controlled and sticky-end like double-stranded breaks. This would enable the ligation of desired sequences into eukaryote genes.

It would not be the first time that a giant leap across the phylogenetic divide is made. The bacterial Taq polymerase used during PCR everyday around the globe is just one example how small the world can be at the molecular level. Darwin would have enjoyed it.


(1) Jinek M. et al., A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity, Science 2012 Aug 17;337(6096):816-21.

(2) Cong L., et al., Multiplex Genome Engineering Using CRISPR/Cas Systems, Science. 2013 Jan 3 [electronic publication ahead of print].

(3) Mali P., et al., RNA-Guided Human Genome Engineering via Cas9, Science. 2013 Jan 3 [electronic publication ahead of print].

Seeing is believing…

November 30, 2012

…and understanding!

Most cellular processes are ingeniously and non-intuitively orchestrated. In addition they contain a large number of components. Understanding all this can pose a challenge. Not surprisingly research experience over the last decades has shown that it actually is a challenge! When ignoring crystal structures, the bulk molecular knowledge about cellular processes comes from indirect evidence such as blots, gels and other traditional techniques. Would it not be interesting to observe processes (1) in vivo, (2) at high resolution and magnification, and (3) in real-time? Probably not many biologists will disagree with this. A major challenge towards this aim, however, is the nature and chemical structure of the cell surface/membrane. From an optical perspective this lipid bilayer structure can be viewed as a so-called “opaque surface”. First of all this means that it is non-transparent and scatters light (Fig. 1). Looking through a sample therefore becomes impossible because the incoming light is so distributed all over the place that it is hard to attach any meaningful information to it.

Image Fig. 1: Plane waves that hit a rough surface are scattered instead of being directly reflected. The resulting random speckles lead to a distorted image of the object in the human eye (Source:

In the past techniques have been developed to extract information from opaque layers which at least let through a small amount of direct beams.  Here, however, I want to present a new approach which has recently been published by a team of Dutch and Italian researchers who managed to retrieve images through a completely opaque surface.

Two layers of material were used for this study. The first layer is the opaque layer which scatters the light and does not allow a direct observation of a 50 μm sized object on the second layer (Fig. 2 (A)). In this case the object to become imaged was the Greek letter pi made out of fluorescent polymers. Both layers are 6 mm apart, which is a large distance considering the size of the object. As depicted in Fig. 2 (B) a laser with 532 nm wavelength is consequently directed at the first opaque layer and serves as the light source. Due to scattering of the light that passes the first layer and due to scattering of the light which fluoresce back through the first layer, it becomes impossible to identify which object is hiding behind the layer (Fig. 2 (C)). It is, however, possible to measure the overall amount of light that originates from the fluorescent object. The researchers now assumed that all the information that one would need in order to reconstruct the image is already contained in this recorded light. Due to scattering and speckling this information is disorganized and cannot be readout by conventional means (such as our eyes and brain). Of course the key element of the study was to develop an algorithm and a technical procedure that were able to extract meaningful information from the chaotic light mix (Fig. 4 (D)). In the following I will explain this procedure in a bit more detail. Since I am NOT a physicist I omitted some of the details, but I am sure that I am mentioning enough facts to understand the technique.


Fig. 2: Experimental design involving an opaque first layer and a second layer with a fluorescent object that could be retrieved by an algorithm involving scanning over different angles and autocorrelating the resulting information. For details see text (Source: Press release “Looking through an opaque material” by the University of Twente, the Netherlands).

Four variables are essential for being able to recalculate the nature of the object behind the opaque layer. First the object’s fluorescent response O which roughly translates to the fluorescent intensity at a given point in space. Secondly the speckle intensity S is important. It describes the amount of light speckle formation also at a given point in space. The third important factor is the angle θ at which the laser shines through the first layer. Finally, the measured overall light intensity I was essential.

During the course of the study, the angle of incident light was slightly varied. By iteratively changing the laser angle and consequently measuring the overall intensity I it became possible to calculate correlations between all four factors. The interesting part is that these correlations are the founding block for organizing the information which is hidden in the scattered fluorescent light and that was thought to be of totally random nature before.

The first step in order to achieve this was the discovery of the relation between incident laser angle and the measured fluorescent intensity I. Interestingly, speckle intensity S and the objects response O remained largely unchanged under one set laser angle. This enabled the researchers to use nine different angle scans, which yielded enough information to autocorrelate S and O. Spatial information from the previously randomly distributed intensity, could know be extracted because the relationship how S, O, and θ influence I was known.

In the same paper the authors also demonstrate the use of this technique for imaging a more complex biological sample hidden behind an opaque layer. However, the amount of detail of such a sample is much greater than the amount of detail of a relatively simple pi letter and reconstruction becomes much more calculation intensive. In order to solve this issue the resolution had to be decreased by increasing the size of the speckle spots, thereby lowering the amount of incoming information. Despite these practical limitations the researchers have clearly demonstrated that it is possible to noninvasively image through an opaque layer, such as a cell membrane. I am sure that this discovery has great potential for molecular and cell biological in vivo studies. This potential is enhanced even more by the possibility of increasing resolution by decreasing speckle spot sizes and by introducing 3D imaging by measuring speckle patterns in an additional direction.

Article: Jacopo Bertolotti, Elbert G. van Putten, Christian Blum, Ad Lagendijk, Willem L. Vos, Allard P. Mosk, Non-invasive imaging through opaque scattering layers, Nature 491(7423), 232–234, 2012.