Genes VII

20.10 Promoters for RNA polymerase II have short sequence elements

Key terms defined in this section
CAAT box is part of a conserved sequence located upstream of the startpoints of eukaryotic transcription units; it is recognized by a large group of transcription factors.

A promoter for RNA polymerase II consists of two types of region. The startpoint itself is identified by the Inr and/or by the TATA box close by. In conjunction with the general transcription factors, RNA polymerase II forms an initiation complex surrounding the startpoint, as we have just described. The efficiency and specificity with which a promoter is recognized, however, depend upon short sequences, farther upstream, which are recognized by upstream or inducible factors. Usually these sequences are ~100 bp upstream of the startpoint, but sometimes they are more distant. Binding of factors at these sites may influence the formation of the initiation complex at (probably) any one of several stages.

Figure 20.16 Saturation mutagenesis of the upstream region of the b-globin promoter identifies three short regions (centered at -30, -75, and -90) that are needed to initiate transcription. These correspond to the TATA, CAAT,

An analysis of a typical promoter is summarized in Figure 20.16. Individual base substitutions were introduced at almost every position in the 100 bp upstream of the β-globin startpoint. The striking result is that most mutations do not affect the ability of the promoter to initiate transcription. Down mutations occur in three locations, corresponding to three short discrete elements. The two upstream elements have a greater effect on the level of transcription than the element closest to the startpoint. Up mutations occur in only one of the elements. We conclude that the three short sequences centered at V30, V75, and V90 constitute the promoter. Each of them corresponds to the consensus sequence for a common type of promoter element.

The TATA box (centered at V30) is the least effective component of the promoter as measured by the reduction in transcription that is caused by mutations. But although initiation is not prevented when a TATA box is mutated, the startpoint varies from its usual precise location. This confirms the role of the TATA box as a crucial positioning component of the core promoter.

The sequence at V75 is the CAAT box. Named for its consensus sequence, it was one of the first common elements to be described. It is often located close to V80, but it can function at distances that vary considerably from the startpoint. It functions in either orientation. Susceptibility to mutations suggests that the CAAT box plays a strong role in determining the efficiency of the promoter. It does not appear to play a direct role in promoter specificity, but its inclusion increases promoter strength.

The GC box at V90 contains the sequence GGGCGG. Often multiple copies are present in the promoter, and they occur in either orientation. It too is a relatively common promoter component.

Figure 20.17 Promoters contain different combinations of TATA boxes, CAAT boxes, GC boxes, and other elements.

Promoters are organized on a principle of "mix and match." A variety of elements can contribute to promoter function, but none is essential for all promoters. Some examples are summarized in Figure 20.17. Four types of element are found altogether in these promoters: TATA, GC boxes, CAAT boxes, and the octamer (an 8 bp element). The elements found in any individual promoter differ in number, location, and orientation. No element is common to all of the promoters. One of the puzzles of promoter organization is that the promoter conveys directional information (transcription proceeds only in the downstream direction), but the GC and CAAT boxes seem to be able to function in either orientation (although their sequences are asymmetrical).

Factors that are more or less ubiquitous are assumed to be available to any promoter that has a copy of the element that they recognize. This common availability distinguishes the upstream factors from the inducible factors that we discuss later. Elements in the upstream category include the CAAT box, GC box, and the octamer. All promoters probably require one or more of these elements in order to function efficiently.

The GC box is recognized by the factor SP1. This interaction illustrates the demands that can be placed on a single factor. The closest GC box usually is 40 V70 bp upstream of the startpoint, but the context of the GC boxes is different in every promoter. So in the thymidine kinase promoter, GC boxes are adjacent to a CAAT box and a TATA box, but in the SV40 promoter, a tandemly repeated series of GC boxes is upstream of a TATA box. The subunit of SP1 is a monomer of 105 kD, which contacts one strand of DNA over a ~20 bp binding site that includes at least one 6 bp GC box. In the SV40 promoter, the multiple boxes between V70 and V110 all are bound, so that the whole region is protected by SP1. In the thymidine kinase promoter, however, SP1 presumably interacts with a factor bound at the CAAT box on one side, and with TFIID bound at the TATA box on the other side.

The sequences to which the factors bind as characterized by footprinting are typically longer than the consensus sequences identified by comparing promoters. They usually cover ~20 bp of DNA, whereas the consensus sequences are <10 bp. Given the sizes of the factors, and the length of DNA each covers, we expect that the various proteins will together cover the entire region upstream of the startpoint in which the elements reside.

The diversity of elements from which a functional promoter may be constructed, and the variation in their locations relative to the startpoint, argues that the factors have an ability to interact with one another by protein-protein interactions in multiple ways. There appear to be no constraints on the potential relationships between the elements. The modular nature of the promoter is illustrated by experiments in which equivalent regions of different promoters have been exchanged. Hybrid promoters, for example, between thymidine kinase and β-globin, work well. This suggests that the main purpose of the elements is to bring the factors they bind into the vicinity of the initiation complex, where protein-protein interactions determine the efficiency of the initiation reaction.

The basal elements and the more upstream elements have different types of functions. the basal elements (the TATA box and Inr) primarily determine the location of the startpoint, but can sponsor initiation only at a rather low level. They identify the location at which the general transcription factors assemble to form the basal complex. The sequence elements farther upstream, such as the GC or CAAT boxes, influence the frequency of initiation, most likely by acting directly on the general transcription factors to enhance the efficiency of assembly into an initiation complex (see later).

How can initiation be influenced by sites spread over a length of DNA that is greater than RNA polymerase could directly contact? Initiation involves a hierarchy of interactions, in which factors bound at upstream elements interact with general factors, which in turn interact directly with RNA polymerase (see later). This helps to explain the flexibility with which elements may be arranged, and the distance over which they can be dispersed, since it relieves us of the obligation to suppose that factors bound to all these elements must interact directly with RNA polymerase.

The most common use of promoter elements is for a particular consensus sequence to be recognized by a corresponding transcription factor (or by a member of a family of factors). However, some elements can be recognized by more than one factor. For example, the CAAT box can interact with factors of the CTF family, the factors CP1 and CP2, and the factors C/EBP and ACF. CAAT boxes in different promoters are recognized by different factors. The exact details of recognition are not so important as the fact that a variety of factors recognize CAAT boxes.

Another example of an element that is recognized by more than one factor is presented by the octamer sequence. A ubiquitous transcription factor, Oct-1, binds to the octamer to activate the histone H2B (and presumably also other) genes. Oct-1 is the only octamer-binding factor in nonlymphoid cells. But in lymphoid cells, a different factor, Oct-2, binds to the octamer to activate the immunoglobulin κ light gene. So Oct-2 is a tissue-specific activator, while Oct-1 is ubiquitous.

The use of the same octamer in the ubiquitously expressed H2B gene and the lymphoid-specific immunoglobulin genes poses a paradox. Why does the ubiquitous Oct-1 fail to activate the immunoglobulin genes in nonlymphoid tissues? The context must be important: Oct-2 rather than Oct-1 may be needed to interact with other proteins that bind at the promoter. These results mean that we cannot predict whether a gene will be activated by a particular factor simply on the basis of the presence of particular elements in its promoter.

There are also cases in which a particular protein can recognize more than one type of sequence. The best characterized example is the protein C/EBP, which binds to the CAAT box, but which also binds to another quite different sequence element.

A pertinent factor in considering transcription in vitro is that the template exists as an accessible DNA molecule. In vivo it is organized into nucleosomes, which suggests that its recognition by RNA polymerase is subject to different constraints. This may influence the geometry of the interactions of transcription factors with DNA, with one another, and with RNA polymerase. To investigate the formation of an active transcription complex in natural circumstances, we need really to use a template consisting of DNA assembled into chromatin rather than free DNA.

Repression of transcription in eukaryotes is generally accomplished at the level of influencing chromatin structure; regulator proteins that function like trans-acting bacterial repressors to block transcription are relatively rare, but some examples are known. One case is the global repressor Dr1/DRAP1, a heterodimer that binds to TBP to prevent it from interacting with other components of the basal apparatus. The importance of this interaction is suggested by the lethality of null mutations in the genes that code for the repressor in yeast.

In a more specific case, the CAAT sequence is a target for regulation. Two copies of this element are found in the promoter of a gene for histone H2B (see Figure 20.17) that is expressed only during spermatogenesis in a sea urchin. CAAT-binding factors can be extracted from testis tissue and also from embryonic tissues, but only the former can bind to the CAAT box. In the embryonic tissues, another protein, called the CAAT-displacement protein (CDP), binds to the CAAT boxes, preventing the transcription factor from recognizing them.

Figure 20.18 A transcription complex involves recognition of several elements in the sea urchin H2B promoter in testis. Binding of the CAAT displacement factor in embryo prevents the CAAT-binding factor from binding, so an active complex cannot form.

Figure 20.18 illustrates the consequences for gene expression. In testis, the promoter is bound by transcription factors at the TATA box, CAAT boxes, and octamer sequences. In embryonic tissue, the exclusion of the CAAT-binding factor from the promoter prevents a transcription complex from being assembled. The analogy with the effect of a bacterial repressor in preventing RNA polymerase from initiating at the promoter is obvious. These results also make the point that the function of a protein in binding to a known promoter element cannot be assumed: it may be an activator, a repressor, or even irrelevant to gene transcription.

Категории