Genes VII

4.11 Mammalian satellites consist of hierarchical repeats

In the mammals, as typified by various rodents, the sequences comprising each satellite show appreciable divergence between tandem repeats. Common short sequences can be recognized by their preponderance among the oligonucleotide fragments released by chemical or enzymatic treatment. However, the predominant short sequence usually accounts for only a small minority of the copies. The other short sequences are related to the predominant sequence by a variety of substitutions, deletions, and insertions.

But a series of these variants of the short unit can constitute a longer repeating unit that is itself repeated in tandem with some variation. So mammalian satellite DNAs are constructed from a hierarchy of repeating units. These longer repeating units constitute the sequences that renature in reassociation analysis. They can also be recognized by digestion with restriction enzymes.

When any satellite DNA is digested with an enzyme that has a recognition site in its repeating unit, one fragment will be obtained for every repeating unit in which the site occurs. In fact, when the DNA of a eukaryotic genome is digested with a restriction enzyme, most of it gives a general smear, due to the random distribution of cleavage sites. But satellite DNA generates sharp bands, because a large number of fragments of identical or almost identical size are created by cleavage at restriction sites that lie a regular distance apart.

Determining the sequence of satellite DNA can be difficult. Using the discrete bands generated by restriction cleavage, we can attempt to obtain a sequence directly. However, if there is appreciable divergence between individual repeating units, different nucleotides will be present at the same position in different repeats, so the sequencing gels will be obscure. If the divergence is not too great Xsay, within ~2% Xit may be possible to determine an average repeating sequence.

Individual segments of the satellite can be inserted into plasmids for cloning. A difficulty is that the satellite sequences tend to be excised from the chimeric plasmid by recombination in the bacterial host. However, when the cloning succeeds, it is possible to determine the sequence of the cloned segment unambiguously. While this gives the actual sequence of a repeating unit or units, we should need to have many individual such sequences to reconstruct the type of divergence typical of the satellite as a whole.

By either sequencing approach, the information we can gain is limited to the distance that can be analyzed on one set of sequence gels. The repetition of divergent tandem copies makes it impossible to reconstruct longer sequences by obtaining overlaps between individual restriction fragments. The satellite DNA of the mouse M. musculus is cleaved by the enzyme EcoRII into a series of bands, including a predominant monomeric fragment of 234 bp. This sequence must be repeated with few variations throughout the 60 V70% of the satellite that is cleaved into the monomeric band. We may analyze this sequence in terms of its successively smaller constituent repeating units.

Figure 4.17 The repeating unit of mouse satellite DNA contains two half-repeats, which are aligned to show the identities (in color).

Figure 4.17 depicts the sequence in terms of two half-repeats. By writing the 234 bp sequence so that the first 117 bp are aligned with the second 117 bp, we see that the two halves are quite well related. They differ at 22 positions, corresponding to 19% divergence. This means that the current 234 bp repeating unit must have been generated at some time in the past by duplicating a 117 bp repeating unit, after which differences accumulated between the duplicates.

Figure 4.18 The alignment of quarter-repeats identifies homologies between the first and second half of each half-repeat. Positions that are the same in all 4 quarter-repeats are shown in color; identities that extend only through 3 quarter-repeats are indicated by grey letters in the pink area.

Within the 117 bp unit, we can recognize two further subunits. Each of these is a quarter-repeat relative to the whole satellite. The four quarter-repeats are aligned in Figure 4.18. The upper two lines represent the first half-repeat of Figure 4.17; the lower two lines represent the second half-repeat. We see that the divergence between the four quarter-repeats has increased to 23 out of 58 positions, or 40%. The first three quarter-repeats are somewhat better related, and a large proportion of the divergence is due to changes in the fourth quarter-repeat.

Figure 4.19 The alignment of eighth-repeats shows that each quarter-repeat consists of an a and a b half. The consensus sequence gives the most common base at each position. The "ancestral" sequence shows a sequence very closely related to the consensus sequence, which could have been the predecessor to the a and b units. (The satellite sequence is continuous, so that for the purposes of deducing the consensus sequence, we can treat it as a circular permutation, as indicated by joining the last GAA triplet to the first 6 bp.)

Looking within the quarter-repeats, we find that each consists of two related subunits (one-eighth-repeats), shown as the α and β sequences in Figure 4.19. The α sequences all have an insertion of a C, and the β sequences all have an insertion of a trinucleotide, relative to a common consensus sequence. This suggests that the quarter-repeat originated by the duplication of a sequence like the consensus sequence, after which changes occurred to generate the components we now see as α and β. Further changes then took place between tandemly repeated αβ sequences to generate the individual quarter- and half-repeats that exist today. Among the one-eighth-repeats, the present divergence is 19/31 = 61%.

Figure 4.20 The existence of an overall consensus sequence is shown by writing the satellite sequence in terms of a 9 bp repeat.

The consensus sequence is analyzed directly in Figure 4.20, which demonstrates that the current satellite sequence can be treated as derivatives of a 9 bp sequence. We can recognize three variants of this sequence in the satellite, as indicated at the bottom of Figure 4.20. If in one of the repeats we take the next most frequent base at two positions instead of the most frequent, we obtain three well-related 9 bp sequences.

G A A A A A C G T

G A A A A A T G A

G A A A A A A C T

The origin of the satellite could well lie in an amplification of one of these three nonamers. The overall consensus sequence of the present satellite is GAAAAAAGTCT, which is effectively an amalgam of the three 9 bp repeats.

The average sequence of the monomeric fragment of the mouse satellite DNA explains its properties. The longest repeating unit of 234 bp is identified by the restriction cleavage. The unit of reassociation between single strands of denatured satellite DNA is probably the 117 bp half-repeat, because the 234 bp fragments can anneal both in register and in half-register (in the latter case, the first half-repeat of one strand renatures with the second half-repeat of the other).

So far, we have treated the present satellite as though it consisted of identical copies of the 234 bp repeating unit. Although this unit accounts for the majority of the satellite, variants of it also are present. Some of them are scattered at random throughout the satellite; others are clustered.

The existence of variants is implied by our description of the starting material for the sequence analysis as the "monomeric" fragment. When the satellite is digested by an enzyme that has one cleavage site in the 234 bp sequence, it also generates dimers, trimers, and tetramers relative to the 234 bp length. They arise when a repeating unit has lost the enzyme cleavage site as the result of mutation.

Figure 4.21 Digestion of mouse satellite DNA with the restriction enzyme EcoRII identifies a series of repeating units (1, 2, 3) that are multimers of 234 bp and also a minor series (½, 1½, 2½) that includes half-repeats (see text later). The band at the far left is a fraction resistant to digestion.

The monomeric 234 bp unit is generated when two adjacent repeats each have the recognition site. A dimer occurs when one unit has lost the site, a trimer is generated when two adjacent units have lost the site, and so on. With some restriction enzymes, most of the satellite is cleaved into a member of this repeating series, as shown in the example of Figure 4.21. The declining number of dimers, trimers, etc. shows that there is a random distribution of the repeats in which the enzyme’s recognition site has been eliminated by mutation.

Other restriction enzymes show a different type of behavior with the satellite DNA. They continue to generate the same series of bands. But they cleave only a small proportion of the DNA, say 5 V10%. This implies that a certain region of the satellite contains a concentration of the repeating units with this particular restriction site. Presumably the series of repeats in this domain all are derived from an ancestral variant that possessed this recognition site (although in the usual way, some members since have lost it by mutation).

A satellite DNA suffers unequal recombination. This has additional consequences when there is internal repetition in the repeating unit. Let us return to our cluster consisting of "ab" repeats. Suppose that the "a" and "b" components of the repeating unit are themselves sufficiently well related to pair. Then the two clusters can align in half-register, with the "a" sequence of one aligned with the "b" sequence of the other. How frequently this occurs will depend on the closeness of the relationship between the two halves of the repeating unit. In mouse satellite DNA, reassociation between the denatured satellite DNA strands in vitro commonly occurs in the half-register.

When a recombination event occurs out of register, it changes the length of the repeating units that are involved in the reaction.

In the upper recombinant cluster, an "ab" unit has been replaced by an "aab" unit. In the lower cluster, the "ab" unit has been replaced by a "b" unit.

This type of event explains a feature of the restriction digest of mouse satellite DNA. Figure 4.20 shows a fainter series of bands at lengths of ½, 1½, 2½, and 3½ repeating units, in addition to the stronger integral length repeats. Suppose that in the preceding example, "ab" represents the 234 bp repeat of mouse satellite DNA, generated by cleavage at a site in the "b" segment. The "a" and "b" segments correspond to the 117 bp half-repeats.

Then in the upper recombinant cluster, the "aab" unit generates a fragment of 1½ times the usual repeating length. And in the lower recombinant cluster, the "b" unit generates a fragment of half of the usual length. (The multiple fragments in the half-repeat series are generated in the same way as longer fragments in the integral series, when some repeating units have lost the restriction site by mutation.)

Turning the argument the other way around, the identification of the half-repeat series on the gel shows that the 234 bp repeating unit consists of two half-repeats well enough related to pair sometimes for recombination. Also visible in Figure 4.21 are some rather faint bands corresponding to ¼- and ¾-spacings. These will be generated in the same way as the ½-spacings, when recombination occurs between clusters aligned in a quarter-register. The decreased relationship between quarter-repeats compared with half-repeats explains the reduction in frequency of the ¼- and ¾-bands compared with the ½-bands.  

Категории