The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.

    Core i7 logic

    Discussion in 'Hardware Components and Aftermarket Upgrades' started by fred2028, Dec 14, 2009.

  1. fred2028

    fred2028 Sexy member

    Reputations:
    196
    Messages:
    2,205
    Likes Received:
    1
    Trophy Points:
    56
    Just curious as to how the Core i7s handle their stuff. Take the 820QM for example. 4 physical + 4 virtual cores, so 8 threads total.

    When a dual-threaded app comes along, does it use 1 physical + 1 virtual core or 2 physical cores and 0 virtual cores?

    When turbo boost turns off certain cores, does it
    1. Turn off all cores except the one(s) that have/has a thread being worke on
    2. Redirect all thread(s) to a specific core and then turn off all other cores?
     
  2. funky monk

    funky monk Notebook Deity

    Reputations:
    233
    Messages:
    1,485
    Likes Received:
    1
    Trophy Points:
    55
    I think that if a multi threaded application came along, it would probably use two physical cores but put the clock speed down. That way it would use less power since power draw increases exponentially with clock speed.

    Again, for turboboost I would think that it would try to spread the load evenly between as many cores as possible to reduce power consumption. This way it also reduces heat, producing faster processors as they can be reliably clocked higher without worry of overheating
     
  3. notyou

    notyou Notebook Deity

    Reputations:
    652
    Messages:
    1,562
    Likes Received:
    0
    Trophy Points:
    55
    I'm not sure how the i7's handle the thread pinning. I did however do a paper on the Nehalem architecture just a few weeks ago so I'll share what I've gotten from that.

    1) As far as I'm aware, Nehalem will power down all cores under a certain threshold.
    2) @Funky - I believe you have it backwards. Power draw increases more with voltage than with clock speed. Though they tend to go hand in hand since more voltage is often required for higher clock speeds. Note, take a look at undervolting and why it makes such a big difference.
    3) The thing about turbo boost is that it works best when only a single core is working. In the papers I studied, having two cores loaded meant that the clock speed gains from TB were smaller than a single core and thus, if you're able to just have one core fully loaded, the application will perform fastest that way (note that this is only for single-threaded applications).
     
  4. Judicator

    Judicator Judged and found wanting.

    Reputations:
    1,098
    Messages:
    2,594
    Likes Received:
    19
    Trophy Points:
    56
    From a conversation we were having in another thread ( http://forum.notebookreview.com/showthread.php?t=440539), it apparently would depend on your OS. Windows 7 server version would go physical core + logical core, while non server versions of Windows 7 would prefer to go with physical cores.
     
  5. Generic User #2

    Generic User #2 Notebook Deity

    Reputations:
    179
    Messages:
    846
    Likes Received:
    0
    Trophy Points:
    30
    from what I've read. it is VERY dependent on what is actually being processed and how well the OS recognizes HT-processors.

    for something 'heavy' like a dual-threaded video encoder(bear with me ;) ), it would go straight for two physical cores. it would also use a HT-thread to run OS processes.

    however, if the system was in idle, it would run OS processes on a real-core and then move those processes onto a ht-thread when a more CPU-hungry application is run.

    basically, it will try to use HT-threads when the OS thinks that the HT-thread has enough power for the process, otherwise, it will go for a real thread.
     
  6. thinkpad knows best

    thinkpad knows best Notebook Deity

    Reputations:
    108
    Messages:
    1,140
    Likes Received:
    0
    Trophy Points:
    55
    That is a good idea, since virtual "cores" usually perform lesser than that of physical cores.
     
  7. BrandonSi

    BrandonSi Notebook Savant

    Reputations:
    571
    Messages:
    1,444
    Likes Received:
    0
    Trophy Points:
    55
    Not exactly right, but I see where you're going.

    You have 4 physical cores, capable of handling two threads each. The difference is when you refer to "virtual cores".. Using that concept, you could have 4 virtual cores and 4 physical cores maxed out at 100% cpu utilization. That's not how it works.. What you actually have is 4 cores, handling 8 threads.. If the 4 cores are running at 100% cpu utilization, you have 8 threads, and each thread using 50% (or 60/40, 70/30, etc..) of the cpu time.

    Similar ideas, but very different implementation.

    Also, the application has the final say over how the threads are utilized, not the OS. That's why a single threaded app can make your Windows super-slow and almost unresponsive at 100% utilization. If it was up to the OS, that wouldn't happen, but the OS handles the requests of the app, not vice-versa.

    That's why you also read a lot about thread-optimization and "properly threaded" apps.

    Edit - I should add, that what I was referring to was default behavior. You can, of course, tell Windows to only use specific cpu's/cores for an application. You, as the user also have final control over the threading, as you can turn HT off in the bios.
     
  8. Judicator

    Judicator Judged and found wanting.

    Reputations:
    1,098
    Messages:
    2,594
    Likes Received:
    19
    Trophy Points:
    56
    Well, the point is that to the OS, it "looks" like 8 cores. It wasn't until windows 7 that windows could even tell the difference between a physical and logical core. In terms of actual engineering, yes, you're right in that it's really only 4 physical cores that can handle up to 2 threads (relatively) simultaneously. We're not quite at the point where we can create processors out of thin air yet (although I'm sure _someone's_ trying!).

    And I would clarify that it's not so much that the application has the final say over how the threads are utilized, so much as the application defines what's in a thread, and how many threads are presented to the OS. That single threaded app that makes your Windows super-slow and almost unresponsive is an example of an application that presents the OS with only one thread stuffed full of everything that needs to be done, as opposed to a more intelligently progammed application (in this era of multi-core processors) that splits up that thread into multiple threads to share the load.
     
  9. BrandonSi

    BrandonSi Notebook Savant

    Reputations:
    571
    Messages:
    1,444
    Likes Received:
    0
    Trophy Points:
    55
    Nice catch. You're correct, it's not actually simultaneous execution. :)
     
  10. davepermen

    davepermen Notebook Nobel Laureate

    Reputations:
    2,972
    Messages:
    7,788
    Likes Received:
    0
    Trophy Points:
    205
    actually, it is simultaneous execution.. if enough free processing units are available on the core.
     
  11. BrandonSi

    BrandonSi Notebook Savant

    Reputations:
    571
    Messages:
    1,444
    Likes Received:
    0
    Trophy Points:
    55
    That's not how I understand it, do you have a source for that? It has been my understanding since the P4 times that HT was actually temporal multi-threading, meaning the two threads share the single core resources, and as such cannot execute instructions simultaneously. Granted, we're taking microseconds, but they must take turns, as there is only one CPU/core and associated resources, for the two threads.
     
  12. weinter

    weinter /dev/null

    Reputations:
    596
    Messages:
    2,798
    Likes Received:
    1
    Trophy Points:
    56
    I remember I read somewhere for HT they actually have 2 set of registers to hold 2 states of running thread and 1 execution core that switches constantly between them.
    Ahh Wikipedia agrees with me
    HT doesn't do simultaneous execution since there is only 1 execution core however it switches so fast and threads have wait states.
     
  13. BrandonSi

    BrandonSi Notebook Savant

    Reputations:
    571
    Messages:
    1,444
    Likes Received:
    0
    Trophy Points:
    55
    Thanks weinter! That's what I thought.. I think for most purposes saying "simultaneous" is OK, but technically it's not, since there's a single point of execution. Good find! :)
     
  14. Tinderbox (UK)

    Tinderbox (UK) BAKED BEAN KING

    Reputations:
    4,740
    Messages:
    8,513
    Likes Received:
    3,823
    Trophy Points:
    431
    Anybody know how to test the i7 turbo mode the 720 is supposed to be able to have 1 core at 2.8ghz

    EDIT : I was just watching the core frequency with cpu-z and saw core 0 go to just under 2.8ghz for a split second.
     
  15. Serg

    Serg Nowhere - Everywhere

    Reputations:
    1,980
    Messages:
    5,331
    Likes Received:
    1
    Trophy Points:
    206
    I will try and explain what I know on this.

    I have a 720QM and so far this is how it goes.

    When stressing ONE core the load will jump between cores...it is a rather strange behaviour, but it works so no complaints.

    The cores have somewhat priority, but the core 0.1 (my naming for the first virtual core) seems to have priority over the core 1.0 and 1.1.

    Just checking my task manager, the cores are:
    Core 0 is 4%
    Core 0.1 is 0%
    Core 1 is 0%
    Core 1.1 is 0%
    Core 2 is 8%
    Core 2.1 is 3%
    Core 3 is 0%
    Core 3.1 is 3%
     
  16. Generic User #2

    Generic User #2 Notebook Deity

    Reputations:
    179
    Messages:
    846
    Likes Received:
    0
    Trophy Points:
    30
    thats how NON-HT processors work.

    the execution units are constantly swapping data in and out of the pipeline and the caches. this is context switching between threads.

    with HT processors though, there is no need to swap out the caches, the 'new' info it needs is ALREADY in the second set of caches. meaning that memory latency is much shorter.


    theres alot more technical details than that, but i'm pretty sure thats an acceptable working model.
     
  17. notyou

    notyou Notebook Deity

    Reputations:
    652
    Messages:
    1,562
    Likes Received:
    0
    Trophy Points:
    55
    I believe you're thinking of some type of multi-threading, which is having multiple threads and constantly switching between them, but not executing multiple threads at once.

    Generic has it right. Simultaneous Multi-Threading means you can execute two threads at once on a single core. And what everyone else is thinking is that the execution core is a single unit, instead, it's actually a couple of functional units (add, subtract, multiply, divide, etc.) which allow it to execute the multiple threads in parallel.
     
  18. Serg

    Serg Nowhere - Everywhere

    Reputations:
    1,980
    Messages:
    5,331
    Likes Received:
    1
    Trophy Points:
    206
    Well, the theory is that you have a two lane road. The actual core can get both lanes of information, and that is why you see two cores, see it this way, each lane you have is a "core" ( which in reality is a thread, but let's use the core word sincr markerting does). Each one of these "cores" has the ability to do one thing at the time, so you get in the case of the 720QM a total of eight "cores" working.

    The logic behind it is that you use the least physical cores as possible while doing as much things as possible. The physical core has priority when doing a single task, when launching a second task, if the first one has some overhead, it will have priority and lower the other 3+3 unless they are needed. When you launch a thitd thread the next physical core will come into play and the virtual accompaigning this core gets ready for usage, so the first 2 threads on the core 0 and 0.1 get a speed cut sincer they have to share. And so on until you cope all eight cores.

    Now enough theory, in real life it is somewhat different, when stressing one core is not THAT single core the onlyt one, all 8 are active, but trhe other seven are on a very low power consumption mode. Why? My guess is that one core cannot handle to much time under full 2.8GHz for too long, since when you check the processor with the task manager or TMonitor or CPU-Z you will see how the TB junps around, and the processes go from 0 to 1 to 2 to 3 and their repesctive virtual cores. I have yet to fully test mmine, but on regular tasks the ht has a priority over the phisical when one thread is already running on the physical core.

    And that is why I see the loads jump all over the CPU...
     
  19. IntelUser

    IntelUser Notebook Deity

    Reputations:
    364
    Messages:
    1,642
    Likes Received:
    75
    Trophy Points:
    66
    Serg, the reason Windows jumps threads around is because way back in the single core days, it helped improve multi-threading performance.

    The Pentium 4 uses Simultaneous Multi-Threading. The only Intel processor that uses other type of multi-threading is in the dual core Itanium code name Montecito and Montvale. THAT is called TMT, or temporal multi-threading.
    (Actually the more used term for TMT is SoEMT)

    SMT is explicitely to increase utilization of otherwise idle execution units. Therefore two threads can process simultaneously.

    There is an app from intel that allows you to observe Turbo Mode. If you have Windows 7/Vista, download the gadget called Turbo Boost Monitor.

    http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=18353&lang=eng
     
  20. Judicator

    Judicator Judged and found wanting.

    Reputations:
    1,098
    Messages:
    2,594
    Likes Received:
    19
    Trophy Points:
    56
    Well, you're both right, depending on from which position you're looking at it. From the OS side, as far as it's concerned, yes, both threads are processing simultaneously. When you get down to the actual processing core itself, it can't actually process both threads at the same time. It'll process one, but whenever it stalls or is idle in one thread, it'll process the other. The difference largely lies in where the thread "swapping", shall we say, is implemented. In a non hyperthreaded/SMT enabled processor this is handled outside the CPU, on the OS level. In an i7, it's handled down at the hardware level, which is why to the OS, it looks like 2 simultaneously processing cores. Basically, as IntelUser said, it's a way to improve the efficiency of the execution units to keep them "occupied" more often. Essentially, it's an efficiency solution that shows up best in heavily multi-threaded/multi-tasking situations, where you have several threads running essentially simultaneously. When you get down to the real nitty gritty, however, it's just a more efficient way to share resources, which is why you only get somewhere around a 10-15% advantage (Well, Intel claims 30%) from hyperthreading in most heavily multi-threaded situations (a CPU with hyperthreading enabled (in theory doubling the threads it can process) will operate approximately 30% faster than the same CPU running the same tasks with hyperthreading turned off).
     
  21. Explosivpotato

    Explosivpotato Notebook Consultant

    Reputations:
    15
    Messages:
    296
    Likes Received:
    0
    Trophy Points:
    30
    This is, for the most part, how I always understood HT. Jumping from one thread to another while the first is idle certainly improves multitasking.

    I also understood (as some previous posters already stated), that it allowed simultaneous processing of 2 dissimilar threads on a single core, but only when one thread leaves an execution portion of the core idle that another thread could be using.
     
  22. davepermen

    davepermen Notebook Nobel Laureate

    Reputations:
    2,972
    Messages:
    7,788
    Likes Received:
    0
    Trophy Points:
    205
    if one hyperthread is doing floatingpoint computations, it will schedule it's tasks to the floatingpoint unit. if the other hyperthread is doing integer computations, it tasks that to the integer units.

    the result will be two computations happen in parallel, for two independent threads.

    as far as i know, it does NOT interleave if the workloads are completely independent, and don't require some parts of the cpu that only exist once.

    that's how i learned hyperthreading: two (or more) schedulers scheduling jobs onto the single core, that has still the same amount of computational units. so it can't do twice the inteter, or sse, or floatingpoint work (which a dualcore could). BUT it can do two different things at the same time, putting the cpu to better usage.

    of course, the main feature of hyperthreading is to hide memory latencies. when ever on thread waits for some data, the other can use the cpu at it's fullest.
     
  23. Serg

    Serg Nowhere - Everywhere

    Reputations:
    1,980
    Messages:
    5,331
    Likes Received:
    1
    Trophy Points:
    206
    Well i7 architecture is a share-the-hardware-between-two-cores thing if you want to see it that way.

    This means the hardware is one, but the core can do two things.

    Launch thread one. Core 0 processes thread one. Launch thread two. Core 1 could process it, but why bother, Core 0 can handle more. Thread 2 goes to core 0, and while thread one is not being processed and is in idle waiting for response from the peripherals or software, the core 0 can get to work on thread 2. This is interpreted as another core by the computer. But it is only one core. So thread 1 has a response, and now thread 2 needs one. So core 0 swaps between thread 1 and thread 2. Thread 1 is being processed while thread 2 is on stand by for response. And so on.

    When you launch a thread 3, it cannot go into the stand by mode on core 0, since core 0 is already working on two threads. So thread 3 goes to the next core. Process repeats for thread 4.

    If you launch a thread 5 and thread 1 is done, thread 5 will go to occupy thread 1 place, and work there, not to bother the core 2 and 3 since they are not needed.
     
  24. BrandonSi

    BrandonSi Notebook Savant

    Reputations:
    571
    Messages:
    1,444
    Likes Received:
    0
    Trophy Points:
    55
    Wow, this got lively! Good discussion everyone! I just want to put a couple of things out there (as I understand them).

    1 - Windows is an SMT aware OS. That is why with no HT enabled, it can process parallel instructions (since we have multiple physical cores).

    2 - Intel takes advantage of Windows being SMT aware for HT, and advertises HT as additional cores, this is why we see 8 CPU's for Quad-core HT. The OS treats these 'virtual' cores as actual cores, it is on the hardware end where the actual 'division of labor' happens.

    3 - HT as it applies to a single core, is TMT. A single core cannot process two instructions at once (unless you're using specific cases, like Judicator mentioned, with FPU / integer operations). The processor will switch between threads to execute instructions, but most of the time, it's not actually simultaneous.

    4 - If we have two threads (thread 1 and thread 2) assigned to core 0, and core 0 is busy executing thread 1, thread 2 (assuming the application is optimally coded) can be executed by another available (non-busy) core. In that sense, HT can be parallel. However, as it might apply to the P4 (or any single core solution), this isn't possible, as there's only one core.

    I think that sums up the main points everyone was trying to make.. agree? Did I miss something, or incorrectly word anything?
     
  25. Judicator

    Judicator Judged and found wanting.

    Reputations:
    1,098
    Messages:
    2,594
    Likes Received:
    19
    Trophy Points:
    56
    3) I wasn't the one that mentioned specific cases (it was more notyou and daverpermen), but at that point I think a lot depends on exactly how many functional processing units are in each "processor core", and what the thread that's attempting to run currently actually needs out of those resources. That, of course, is highly dependent on the actual architecture of said core as well as the thread itself.

    4) I think that decision is up to the OS, not the processor. The OS assigns threads to cores, and the cores then run said threads. This was a big part of why Windows 7 was supposed to be so important for i7 with it's SMT parking; it'll deliberately try to assign tasks to separate physical cores before using hyperthreading to put 2 threads onto one core. Vista and XP, from what I can tell, can't tell the difference between a physical and logical core, and thus are just as likely to put the 2 most taxing threads on a single physical core, and thus slow things down overall. Note that Windows 7 server apparently goes the other way, and thanks to Core parking, will schedule things on as few (physical) cores as possible, to save power.
     
  26. BrandonSi

    BrandonSi Notebook Savant

    Reputations:
    571
    Messages:
    1,444
    Likes Received:
    0
    Trophy Points:
    55
    Interesting, thanks Judicator. Apologies if I mixed up who said what.

    Just to clarify, "Windows 7 Server" = Server 2008, correct?
     
  27. f4ding

    f4ding Laptop Owner

    Reputations:
    261
    Messages:
    2,085
    Likes Received:
    0
    Trophy Points:
    55
    davepermen got it right. Each physical core can handle two threads AT ONCE. Each physical core has smaller units, like floating point unit (FPU), integer unit (IU), and so on. If thread 1 requires FPU only and not IU, so that core 1 IU is idle, then core 1 can handle another thread 2 IF thread 2 only requires IU but not FPU. That's the whole concept of SMT, and the reason why some programs will see advantage (usually scientific program) while some don't in using SMT-capable CPU.

    I saw a paper somewhere by Intel or a graduate student that analyzed the new SMT in the i7 and how it is now more efficient than the old P4-HT. And in fact, Intel is not the first one to SMT. IBM and SUN are already talking about 8-logical core per CPU (or is it 80?). Although theirs are different architechtures altogether.
     
  28. f4ding

    f4ding Laptop Owner

    Reputations:
    261
    Messages:
    2,085
    Likes Received:
    0
    Trophy Points:
    55
    No, the server version of Windows 7 is Windows Server 2008 R2, which I am using. :p
     
  29. f4ding

    f4ding Laptop Owner

    Reputations:
    261
    Messages:
    2,085
    Likes Received:
    0
    Trophy Points:
    55
    The decision is not up to the OS. The logical cores are transparent to the OS. The OS only need to realize that the CPU is SMT capable, but it does not decide which thread is assigned to which CPU. The OS scheduler might need some modification so that it will give priority to physical cores over logical cores.
     
  30. davepermen

    davepermen Notebook Nobel Laureate

    Reputations:
    2,972
    Messages:
    7,788
    Likes Received:
    0
    Trophy Points:
    205
    no. the os knows to which physical and virtual core it assigns the thread to, and priories according to workload.
     
  31. Judicator

    Judicator Judged and found wanting.

    Reputations:
    1,098
    Messages:
    2,594
    Likes Received:
    19
    Trophy Points:
    56
    That seems contradictory? It sounds like you're saying the OS scheduler is what assigns threads to cores, since you're saying that it needs to be modified to give priority to physical cores over logical cores, but then isn't the OS scheduler part of the OS? And if the logical cores are transparent to the OS and thus presumably the OS scheduler, then how could the OS scheduler give priority to physical cores over logical cores it couldn't even see?
     
  32. f4ding

    f4ding Laptop Owner

    Reputations:
    261
    Messages:
    2,085
    Likes Received:
    0
    Trophy Points:
    55
    The old OS scheduler could not do this. It simply assign threads to all cores, physical or logical. This affect performance. That's you read news with Intel working with Microsoft to help with improving SMT-capable CPU performance. The new OS scheduler realizes and can diffirentiate between physical and logical cores (or at least it should, that's the plan or the logical progression). But in the end, which cores (logical or physical) each threads are going to does not matter to the OS. The OS only passes the thread to the CPU as far as the OS is concern.
     
  33. Serg

    Serg Nowhere - Everywhere

    Reputations:
    1,980
    Messages:
    5,331
    Likes Received:
    1
    Trophy Points:
    206
    AFAIK the OS assigns the threads to the cores.

    As a little of topic, ran a little test using CATIA V5 R19 on my 720QM. The 4 physical cores got a small load shared between, and the 4 virtual cores saw little to no action, or a very small quantity compared to their physical brothers.

    Same will testing Microsoft ISE. I have yet to install CS4 on my laptop and test it.
     
  34. Judicator

    Judicator Judged and found wanting.

    Reputations:
    1,098
    Messages:
    2,594
    Likes Received:
    19
    Trophy Points:
    56
    That's the whole thing. If the OS doesn't care which core the thread is going to, there's no reason to modify the OS scheduler (part of the OS!) to differentiate between physical and logical cores. The fact that the OS scheduler was modified to differentiate between physical and logical cores implies to me that that is what assigns threads to cores (otherwise there'd be no reason for it to know the difference), and thus the decision of which core gets which thread falls back into the province of the OS. Not to mention, of course, the option of setting core affinity, which also reinforces the idea that thread/core selection is the province of the OS.