|MadSci Network: Molecular Biology|
Dear Clemark, I think that your question is pretty good because sometimes the DNA nomenclature can become a bit of a headache. But fortunately, we very often only need to use the basic stuff. Here are some things that you should know before using the DNA nomenclature. 1) A DNA sequence is composed of nucleotides 2) A nucleotide is composed of a sugar (deoxyribose), a phosphate and a base. 3) The name for each base are : 'A' -> adenine 'G' -> guanine 'C' -> cytosine 'T' -> thymine 4) Each DNA sequence has a polarity. It means that the sequence 'AGCT' is not the same that the sequence 'TCGA'. We read the sequence from left to right and in a 'DNA world', it means from the 5' extremity of the sequence to the 3' extremity. By the way, you can have a look at the following link to get some pictures about DNA structure. Look at figure 3 to see a DNA double helix (with the 5' and 3' ends shown). http://www.iacr.bbsrc.ac.uk/notebook/courses/guide/dnast.htm So, the sequence 'ACGT' should be written 5'-ACGT-3' for more clarity but they are equivalent. In fact, you will see more often ACGT because it is quicker to write and the 5' end is always at the left of the sequence (by definition). As you have noticed, some people use other characters to write a sequence. For example, the sequence pTpT is exactly the same as TT. The pTpT explicitely shows that each base are attached to a phosphate. This is implicit because we are talking about a sequence of DNA and not of individual nucleotide. So, each nucleotide in a sequence have a phosphate attached on their 5' extremity (could be different for the first one sometimes). Another way of writing TT is p(dT)2. This one means that the 5' nucleotide has a phosphate (the other as well because this is a sequence) and it also shows by the lowercase 'd' that the base are deoxynucleotide (DNA). As you see, you can emphasize different aspects of the sequence by writing it in different ways. Ok, I think that I should do a small summary. Let's assume that all base have a phosphate attached on their 5' end which is the usual way. a) AACT -> (A)2CT -> dAdAdCdT The above notations are all equivalents. b) dArAdCU This sequence shows that the second A is a ribonucleotide (RNA) and that the last one is also a ribonucleotide (because T is a different base in RNA, a U). Obviously, if it is clear that you are writing a whole RNA sequence, you don't need to put 'r' at each base. It would be a waste of space to write a 150 bases sequence with a 'r' between each base ! So, most of the time the context is enough to determine the nature of the sequence (DNA or RNA) unless the same sequence contains both at the same time. Finally, the bases also have their own nomenclature. For instance, the sequence AGNC means that the third base could be anything. Also, the sequence AGHC means that the third base could be a A, C or T but not a G. There are plenty of codes like that but I think that you don't really need them. The basic nomenclature is enough most of the time. I hope that you undertand more with these explanations and that it is clear enough. Daniel
Try the links in the MadSci Library for more information on Molecular Biology.