Problems? See this thread at archive.org.

Processor Cache.

Discussion in 'Hardware Components and Aftermarket Upgrades' started by Nicolas41390, Jan 1, 2007.

Nicolas41390 Notebook Consultant

Reputations:

2

Messages:

138

Likes Received:

0

Trophy Points:

30

One thing that I don't know is what exactly processor cache is. The only thing that I know is that the more that the processor does not have to write to the RAM less often, but that is about it, any help???

Nicolas41390, Jan 1, 2007

#1
wearetheborg Notebook Virtuoso

Reputations:

1,282

Messages:

3,122

Likes Received:

0

Trophy Points:

105

Access times are much smaller to access data in processor cache.
There are multiple levels of cache.
If processor wants some data, first it will look in L1 cache, then L2, then RAm, then HDD.

One thing I dont know is, the C2Ds have twice the cache of CDs, yet it barley makes a dent in performance.
Why ?????

wearetheborg, Jan 1, 2007

#2
ZaZ Super Model Super Moderator

Reputations:

4,982

Messages:

34,001

Likes Received:

1,420

Trophy Points:

581

Only the T7x00s have twice the L2 cache. CPU intensive apps like audio or video encoding is where it will shine the most. For typical tasks like Internet or Office the CPU is not the performance bottleneck.

ZaZ, Jan 1, 2007

#3
mujtaba ZzzZzz Super Moderator

Reputations:

4,242

Messages:

3,088

Likes Received:

516

Trophy Points:

181

More importantly,the more the cache increases the latency increases too,meaning an increased access time.One of the reasons that Athlon X2 performed better than Core Duo.

mujtaba, Jan 1, 2007

#4
Jalf Comrade Santa

Reputations:

2,883

Messages:

3,468

Likes Received:

0

Trophy Points:

105

The cache is just a small, local storage.

You can look at the storage hierarchy as a big pyramid.

At the bottom, you have the harddrive, which is huuuuuuuuge, and sloooooooow.
On top of that, because harddrives are way too slow to run the CPU off (taking 10 ms to fetch data? That'd allow us to execute 100 instructions per second), we have the RAM. It's a lot smaller (~2GB vs 500GB), but also roughly a million times faster. Very nice, but we can do better.
So, because RAM is still too slow for some things, the CPU itself has a yet smaller, but faster storage (the L2 cache, which may be 512K-4MB). Compared to the RAM, it's tiny, but it's fast as hell (say, a 100x faster) And again on top of that, because we like things to be *really* fast, we put the L1 cache, which may be another 5x faster, but can only store 16-64KB, which may sound like so little it's useless.

The CPU cache is completely under the control of the CPU. Software doesn't say "Put this data into the CPU cache", but rather the CPU tries to "guess" which data would be best to put in the cache for faster access.

And the way the CPU does this, is simple. Recently used data is kept in cache. When the CPU requests a chunk of data, it first checks the L1 cache. If it's not there, it checks in L2. If it's not there, it checks RAM. Once it's been found, though, it copies the data into the cache. If the cache is full already, the *least* used data will be kicked out. (That is, the data that hasn't been used the longest)

Now, for this scheme to make sense, we have to accept that memory accesses follow two trends, called temporal and spatial locality.
That is, when the CPU requests address A, it'll usually need address A+1 soon after (Spatial locality. That is, subsequent address requests tend to target roughly the same area of memory).
And when the CPU requests address B, it'll usually need B itself again very soon. That's temporal spatiality. Multiple accesses to the same address tend to be very close together in time.

What this means is that the L1 cache is not useless at all. It may be so tiny it makes a floppy disk look impressive, but because of temporal and spatial locality, we're able to predict pretty well what data we're going to need in the future, and that data tends to be only a few KB.
So even if the L1 cache can only hold 32KB (C2D, I think), or 64KB (Athlon 64), if it holds *the right* 32KB, the CPU can use that almost without accessing the RAM at all for a while. And since RAM might be 500 times slower, that's a good thing.
And "the right" 32KB simply means "the data that we've recently requested, and probably its neighbors". Which is exactly what is stored there anyway.

And even if we don't find what we need in the L1 cache, we've got the bigger L2 cache to catch a lot of the remaining requests. And if that fails, we have the RAM, which is huge by comparison. And if we're really unlucky, we have to look in the OS pagefile on the harddrive. But that doesn't happen often.
That's the purpose of the big storage hierarchy I outlined above. The lower layers are big, so they can keep everything we need, although that forces them to be slower as well. The upper layers are (much) smaller, but also amazingly fast. That's the best approximation we can get to the ideal storage (infinitely big and infinitely fast).

So, does this work?
You bet.
On average, more than 90% of all memory accesses are caught by the L1 cache. Less than 10% have to go on to the L2 cache, and from there, even fewer have to go all the way to RAM.

And of course, if the cache grows even bigger, the hit ratio goes up further.
One downside to a bigger cache though, is as mujtaba said, that it gets slower too. Bigger memory capacity means slower speeds, no matter what. That's why RAM is so big in comparison to cache. Because it's nowhere near as fast as cache is. And L2 cache is slower than L1 because it's a lot bigger.
And obviously then, a big L2 cache is slower than a small L2 cache.

Jalf, Jan 1, 2007

#5
wearetheborg Notebook Virtuoso

Reputations:

1,282

Messages:

3,122

Likes Received:

0

Trophy Points:

105

Jalf, do u know whats the size of L1 whichs leads to a miss rate of 20% ?
A miss rate of 10% ?
I want to know the point of diminishing returns.

wearetheborg, Jan 1, 2007

#6
Jalf Comrade Santa

Reputations:

2,883

Messages:

3,468

Likes Received:

0

Trophy Points:

105

There isn't one single point of diminishing returns, but we're already well up there. (Well, look at it like this. If the first 32KB gives you 90% hit rate, the next 32KB can never give you more than a 10% improvement, so even now, we're seeing *quickly* diminishing returns.

I looked it up though (Hennessy & Patterson: Computer Architecture: A Quantitative Approach).

It looks like roughly 10% miss rate for a 4KB cache. 32KB gives a 5% miss rate and 64KB gives 4%.
256KB gives a 1% miss rate.

Keep in mind that these are only estimates. It depends on the code that is being executed, as well as the cache structure (associativity and block size in particular)

Jalf, Jan 1, 2007

#7
squeakygeek Notebook Consultant

Reputations:

10

Messages:

185

Likes Received:

0

Trophy Points:

30

If the cache is full already, the *least* used data will be kicked out. (That is, the data that hasn't been used the longest)

Click to expand...

This scheme is called fully associative and you won't find it in a real processor because it isn't practical. More likely you will find something called 2-way (or 4-way) set associative. This means that for a given memory location, there are 2 places (or 4) in the cache where it can be stored. The least recently used place of those two places is the one that will be overwritten.

squeakygeek, Jul 22, 2007

#8
SideSwipe Notebook Virtuoso

Reputations:

756

Messages:

2,578

Likes Received:

0

Trophy Points:

55

theres L3 cache now too in the higher performance CPUs

SideSwipe, Jul 22, 2007

#9
Vagabondllama Notebook Consultant

Reputations:

30

Messages:

297

Likes Received:

0

Trophy Points:

30

Interesting. Now I have a basic understanding of what CPU caches are. Woo. Care to explain what the front side bus does?

Vagabondllama, Jul 22, 2007

#10
Charles P. Jefferies Lead Moderator Super Moderator

Reputations:

22,339

Messages:

36,639

Likes Received:

5,091

Trophy Points:

931

Vagabondllama said: ↑

Interesting. Now I have a basic understanding of what CPU caches are. Woo. Care to explain what the front side bus does?

Click to expand...

It's basically the information highway between the processor and the memory.

Charles P. Jefferies, Jul 22, 2007

#11
Vagabondllama Notebook Consultant

Reputations:

30

Messages:

297

Likes Received:

0

Trophy Points:

30

Oh, Ok. Not nearly as complicated as I imagined.

Vagabondllama, Jul 22, 2007

#12
Jalf Comrade Santa

Reputations:

2,883

Messages:

3,468

Likes Received:

0

Trophy Points:

105

Yeah, the FSB is just the bus used to transfer data between memory controller (on the motherboard) and CPU. No big mysteries there.

Jalf, Jul 23, 2007

#13

Processor Cache.

Nicolas41390 Notebook Consultant

wearetheborg Notebook Virtuoso

ZaZ Super Model Super Moderator

mujtaba ZzzZzz Super Moderator

Jalf Comrade Santa

wearetheborg Notebook Virtuoso

Jalf Comrade Santa

squeakygeek Notebook Consultant

SideSwipe Notebook Virtuoso

Vagabondllama Notebook Consultant

Charles P. Jefferies Lead Moderator Super Moderator

Vagabondllama Notebook Consultant

Jalf Comrade Santa