Re: What are the different ways to write/abbreviate nucleotides?

Date: Sat Apr 7 08:28:00 2001
Posted By: Daniel Lafontaine, Post-doc/Fellow, Biochemistry, University of Dundee
Area of science: Molecular Biology
ID: 983856476.Mb

Message:


Dear Clemark,

I think that your question is pretty good because sometimes the DNA 
nomenclature can become a bit of a headache. But fortunately, we 
very often only need to use the basic stuff.

Here are some things that you should know before using the DNA 
nomenclature.

1) A DNA sequence is composed of nucleotides

2) A nucleotide is composed of a sugar (deoxyribose), a phosphate 
and a base.

3) The name for each base are :
'A' -> adenine
'G' -> guanine
'C' -> cytosine
'T' -> thymine

4) Each DNA sequence has a polarity. It means that the sequence 
'AGCT' is not the same that the sequence 'TCGA'. We read the 
sequence from left to right and in a 'DNA world', it means from the 5' 
extremity of the sequence to the 3' extremity. By the way, you can have 
a look at the following link to get some pictures about DNA structure. 
Look at figure 3 to see a DNA double helix (with the 5' and 3' ends 
shown).
 http://www.iacr.bbsrc.ac.uk/notebook/courses/guide/dnast.htm

So, the sequence 'ACGT' should be written 5'-ACGT-3' for more clarity 
but they are equivalent. In fact, you will see more often ACGT because 
it is quicker to write and the 5' end is always at the left of the 
sequence (by definition).

As you have noticed, some people use other characters to write a 
sequence. For example, the sequence pTpT is exactly the same as 
TT. The pTpT explicitely shows that each base are attached to a 
phosphate. This is implicit because we are talking about a sequence 
of DNA and not of individual nucleotide. So, each nucleotide in a 
sequence have a phosphate attached on their 5' extremity (could be 
different for the first one sometimes). Another way of writing TT is 
p(dT)2. This one means that the 5' nucleotide has a phosphate (the 
other as well because this is a sequence) and it also shows by the 
lowercase 'd' that the base are deoxynucleotide (DNA). 

As you see, you can emphasize different aspects of the sequence by 
writing it in different ways.

Ok, I think that I should do a small summary. Let's assume that all 
base have a phosphate attached on their 5' end which is the usual 
way.

a) AACT -> (A)2CT -> dAdAdCdT
The above notations are all equivalents.

b) dArAdCU
This sequence shows that the second A is a ribonucleotide (RNA) 
and that the last one is also a ribonucleotide (because T is a different 
base in RNA, a U). Obviously, if it is clear that you are writing a whole 
RNA sequence, you don't need to put 'r' at each base. It would be a 
waste of space to write a 150 bases sequence with a 'r' between 
each base ! So, most of the time the context is enough to determine 
the nature of the sequence (DNA or RNA) unless the same sequence 
contains both at the same time.

Finally, the bases also have their own nomenclature. For instance, 
the sequence AGNC means that the third base could be anything. 
Also, the sequence AGHC means that the third base could be a A, C 
or T but not a G. There are plenty of codes like that but I think that you 
don't really need them. The basic nomenclature is enough most of 
the time.

I hope that you undertand more with these explanations and that it is 
clear enough.

Daniel

Current Queue | Current Queue for Molecular Biology | Molecular Biology archives

Try the links in the MadSci Library for more information on Molecular Biology.