32/64-Bit 80x86 Assembly Language Architecture

Several mechanisms have been put into place to squeeze optimal throughput from the processors. One method of cache manipulation discussed in Chapter 10, "Branching," is Intel's hint as to the prediction of logic flow through branches counter to the static prediction logic. Another mechanism is a hint to the processor about cache behavior so as to give the processor insight into how a particular piece of code is utilizing memory access. Here is a brief review of some terms that have already been discussed:

For speed and efficiency, when memory is accessed for read or write a cache line containing that data (whose length is dependent upon manufacturer and version) is copied from system memory to high-speed cache memory. The processor performs read/write operations on the cache memory. When a cache line is invalidated, the write back of that cache line to system memory occurs. In a multiprocessor system, this occurs frequently due to non-sharing of internal caches. The second stage of writing the cache line back to system memory is called a "write back."

Cache Sizes

Different processors have different cache sizes for data and for code. These are dependent upon processor model, manufacturer, etc., as shown below:

CPU

L1 Cache (Data /Code)

L2 Cache

Celeron

16Kb /16Kb

256Kb

Pentium 4

8Kb /12K m ops

512Kb

Athlon XP

64Kb /64Kb

256Kb

Duron

64Kb /64Kb

64Kb

Pentium M

32Kb /32Kb

1024Kb

Xeon

 

512Kb

Depending on your code and level of optimization, the size of the cache may be of importance. For the purposes of this book, however, it is being ignored, as that topic is more suitable for a book very specifically targeting heavy-duty optimization. This book, however, is interested in the cache line size as that is more along the lightweight optimization that has been touched on from time to time. It should be noted that AMD uses a minimum size of 32 bytes.

Cache Line Sizes

The (code/data) cache line size determines how many instruction/data bytes can be preloaded.

Intel

Cache Line Size

PIII

32

Pentium M

64

P4

64

Xeon

64

AMD

Cache Line Size

Athlon

64

Opteron

64

The cache line size can be obtained by using the CPUID instruction with EAX set to 1. The following calculation will give you the actual cache line size.

mov eax,1 cpuid and ebx,00000FF00h shr ebx,8-3 ; ebx = size of cache line

PREFETCH x Prefetch Data into Caches

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

PREFETCH

     

Категории