The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.

    Processor Cache.

    Discussion in 'Hardware Components and Aftermarket Upgrades' started by Nicolas41390, Jan 1, 2007.

  1. Nicolas41390

    Nicolas41390 Notebook Consultant

    Reputations:
    2
    Messages:
    138
    Likes Received:
    0
    Trophy Points:
    30
    One thing that I don't know is what exactly processor cache is. The only thing that I know is that the more that the processor does not have to write to the RAM less often, but that is about it, any help???
     
  2. wearetheborg

    wearetheborg Notebook Virtuoso

    Reputations:
    1,282
    Messages:
    3,122
    Likes Received:
    0
    Trophy Points:
    105
    Access times are much smaller to access data in processor cache.
    There are multiple levels of cache.
    If processor wants some data, first it will look in L1 cache, then L2, then RAm, then HDD.

    One thing I dont know is, the C2Ds have twice the cache of CDs, yet it barley makes a dent in performance.
    Why ?????
     
  3. ZaZ

    ZaZ Super Model Super Moderator

    Reputations:
    4,982
    Messages:
    34,001
    Likes Received:
    1,420
    Trophy Points:
    581
    Only the T7x00s have twice the L2 cache. CPU intensive apps like audio or video encoding is where it will shine the most. For typical tasks like Internet or Office the CPU is not the performance bottleneck.
     
  4. mujtaba

    mujtaba ZzzZzz Super Moderator

    Reputations:
    4,242
    Messages:
    3,088
    Likes Received:
    516
    Trophy Points:
    181
    More importantly,the more the cache increases the latency increases too,meaning an increased access time.One of the reasons that Athlon X2 performed better than Core Duo.
     
  5. Jalf

    Jalf Comrade Santa

    Reputations:
    2,883
    Messages:
    3,468
    Likes Received:
    0
    Trophy Points:
    105
    The cache is just a small, local storage.

    You can look at the storage hierarchy as a big pyramid.

    At the bottom, you have the harddrive, which is huuuuuuuuge, and sloooooooow.
    On top of that, because harddrives are way too slow to run the CPU off (taking 10 ms to fetch data? That'd allow us to execute 100 instructions per second), we have the RAM. It's a lot smaller (~2GB vs 500GB), but also roughly a million times faster. Very nice, but we can do better.
    So, because RAM is still too slow for some things, the CPU itself has a yet smaller, but faster storage (the L2 cache, which may be 512K-4MB). Compared to the RAM, it's tiny, but it's fast as hell (say, a 100x faster) And again on top of that, because we like things to be *really* fast, we put the L1 cache, which may be another 5x faster, but can only store 16-64KB, which may sound like so little it's useless.

    The CPU cache is completely under the control of the CPU. Software doesn't say "Put this data into the CPU cache", but rather the CPU tries to "guess" which data would be best to put in the cache for faster access.

    And the way the CPU does this, is simple. Recently used data is kept in cache. When the CPU requests a chunk of data, it first checks the L1 cache. If it's not there, it checks in L2. If it's not there, it checks RAM. Once it's been found, though, it copies the data into the cache. If the cache is full already, the *least* used data will be kicked out. (That is, the data that hasn't been used the longest)

    Now, for this scheme to make sense, we have to accept that memory accesses follow two trends, called temporal and spatial locality.
    That is, when the CPU requests address A, it'll usually need address A+1 soon after (Spatial locality. That is, subsequent address requests tend to target roughly the same area of memory).
    And when the CPU requests address B, it'll usually need B itself again very soon. That's temporal spatiality. Multiple accesses to the same address tend to be very close together in time.

    What this means is that the L1 cache is not useless at all. It may be so tiny it makes a floppy disk look impressive, but because of temporal and spatial locality, we're able to predict pretty well what data we're going to need in the future, and that data tends to be only a few KB.
    So even if the L1 cache can only hold 32KB (C2D, I think), or 64KB (Athlon 64), if it holds *the right* 32KB, the CPU can use that almost without accessing the RAM at all for a while. And since RAM might be 500 times slower, that's a good thing.
    And "the right" 32KB simply means "the data that we've recently requested, and probably its neighbors". Which is exactly what is stored there anyway.

    And even if we don't find what we need in the L1 cache, we've got the bigger L2 cache to catch a lot of the remaining requests. And if that fails, we have the RAM, which is huge by comparison. And if we're really unlucky, we have to look in the OS pagefile on the harddrive. But that doesn't happen often.
    That's the purpose of the big storage hierarchy I outlined above. The lower layers are big, so they can keep everything we need, although that forces them to be slower as well. The upper layers are (much) smaller, but also amazingly fast. That's the best approximation we can get to the ideal storage (infinitely big and infinitely fast).

    So, does this work?
    You bet. :D
    On average, more than 90% of all memory accesses are caught by the L1 cache. Less than 10% have to go on to the L2 cache, and from there, even fewer have to go all the way to RAM.

    And of course, if the cache grows even bigger, the hit ratio goes up further.
    One downside to a bigger cache though, is as mujtaba said, that it gets slower too. Bigger memory capacity means slower speeds, no matter what. That's why RAM is so big in comparison to cache. Because it's nowhere near as fast as cache is. And L2 cache is slower than L1 because it's a lot bigger.
    And obviously then, a big L2 cache is slower than a small L2 cache.
     
  6. wearetheborg

    wearetheborg Notebook Virtuoso

    Reputations:
    1,282
    Messages:
    3,122
    Likes Received:
    0
    Trophy Points:
    105
    Jalf, do u know whats the size of L1 whichs leads to a miss rate of 20% ?
    A miss rate of 10% ?
    I want to know the point of diminishing returns.
     
  7. Jalf

    Jalf Comrade Santa

    Reputations:
    2,883
    Messages:
    3,468
    Likes Received:
    0
    Trophy Points:
    105
    There isn't one single point of diminishing returns, but we're already well up there. (Well, look at it like this. If the first 32KB gives you 90% hit rate, the next 32KB can never give you more than a 10% improvement, so even now, we're seeing *quickly* diminishing returns.

    I looked it up though (Hennessy & Patterson: Computer Architecture: A Quantitative Approach).

    It looks like roughly 10% miss rate for a 4KB cache. 32KB gives a 5% miss rate and 64KB gives 4%.
    256KB gives a 1% miss rate.

    Keep in mind that these are only estimates. It depends on the code that is being executed, as well as the cache structure (associativity and block size in particular)
     
  8. squeakygeek

    squeakygeek Notebook Consultant

    Reputations:
    10
    Messages:
    185
    Likes Received:
    0
    Trophy Points:
    30
    This scheme is called fully associative and you won't find it in a real processor because it isn't practical. More likely you will find something called 2-way (or 4-way) set associative. This means that for a given memory location, there are 2 places (or 4) in the cache where it can be stored. The least recently used place of those two places is the one that will be overwritten.
     
  9. SideSwipe

    SideSwipe Notebook Virtuoso

    Reputations:
    756
    Messages:
    2,578
    Likes Received:
    0
    Trophy Points:
    55
    theres L3 cache now too in the higher performance CPUs
     
  10. Vagabondllama

    Vagabondllama Notebook Consultant

    Reputations:
    30
    Messages:
    297
    Likes Received:
    0
    Trophy Points:
    30
    Interesting. Now I have a basic understanding of what CPU caches are. Woo. Care to explain what the front side bus does?
     
  11. Charles P. Jefferies

    Charles P. Jefferies Lead Moderator Super Moderator

    Reputations:
    22,339
    Messages:
    36,639
    Likes Received:
    5,091
    Trophy Points:
    931
    It's basically the information highway between the processor and the memory.
     
  12. Vagabondllama

    Vagabondllama Notebook Consultant

    Reputations:
    30
    Messages:
    297
    Likes Received:
    0
    Trophy Points:
    30
    Oh, Ok. Not nearly as complicated as I imagined.
     
  13. Jalf

    Jalf Comrade Santa

    Reputations:
    2,883
    Messages:
    3,468
    Likes Received:
    0
    Trophy Points:
    105
    :)

    Yeah, the FSB is just the bus used to transfer data between memory controller (on the motherboard) and CPU. No big mysteries there.