Genes VII

20.6 The startpoint for RNA polymerase II

Key terms defined in this section
TATA box is a conserved A PT-rich septamer found about 25 bp before the startpoint of each eukaryotic RNA polymerase II transcription unit; may be involved in positioning the enzyme for correct initiation.

RNA polymerase II cannot initiate transcription itself, but is absolutely dependent on auxiliary transcription factors. The enzyme together with these factors constitutes the basal transcription apparatus that is needed to transcribe any promoter. Our starting point for considering promoter organization is therefore to define a "generic" promoter, the shortest sequence at which RNA polymerase II can initiate transcription, and to characterize the enzyme subunits and transcription factors that are needed to recognize it.

A generic promoter can in principle be expressed in any cell. The accessory proteins that are required for polymerase II to initiate at such a promoter define the general transcription factors involved in the mechanics of binding to DNA and initiating transcription. The general factors are described as TFIIX, where "X" is a letter that identifies the individual factor. A generic promoter functions at only a low efficiency; additional upstream factors are required for a proper level of function. The upstream and inducible factors are not described systematically, but have casual names reflecting their histories of identification.

We may expect any sequence components involved in the binding of RNA polymerase and general transcription factors to be conserved at most or all promoters. As with bacterial promoters, when promoters for RNA polymerase II are compared, homologies in the regions near the startpoint are restricted to rather short sequences. These elements correspond with the sequences implicated in promoter function by mutation.

At the startpoint, there is no extensive homology of sequence, but there is a tendency for the first base of mRNA to be A, flanked on either side by pyrimidines. (This description is also valid for the CAT start sequence of bacterial promoters.) This region is called the initiator (Inr), and may be described in the general form Py2CAPy5. The Inr is contained between positions V3 and +5. A promoter consisting only of the Inr has the simplest possible form recognizable by RNA polymerase II.

Most promoters have a sequence called the TATA box, usually located ~25 bp upstream of the startpoint. It constitutes the only upstream promoter element that has a relatively fixed location with respect to the startpoint. The 8 bp consensus sequence consists entirely of A PT base pairs (at two positions the orientation is variable), and in only a minority of actual cases is a G PC pair present. The TATA box tends to be surrounded by G PC-rich sequences, which could be a factor in its function. It is almost identical with the V10 sequence found in bacterial promoters; in fact, it could pass for one except for the difference in its location at V25 instead of V10.

Single base substitutions in the TATA box act as strong down mutations. Some mutations reverse the orientation of an A PT pair, so base composition alone is not sufficient for its function. So the TATA box comprises an element whose behavior is analogous to our concept of the bacterial promoter: a short, well-defined sequence just upstream of the startpoint, which is necessary for transcription. The minority of promoters that do not contain a TATA element are called TATA-less promoters.
