The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.

    XPS 13 9360 - Crashes under load

    Discussion in 'Dell XPS and Studio XPS' started by Fish-Face, Nov 16, 2016.

  1. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Hi all, I recently (about a week ago) bought a New 2016 XPS 13. The specs are: i7, 16GB, 512GB SSD, QHD. I am dual booting Windows and Fedora Linux, and aside from the coil whine issue which is pretty irritating (worse under Windows) the day before yesterday it started crashing under load. The first time was while playing a video (720p) in Linux, then while trying to install some software in Linux. Yesterday I reproduced the problem in Linux by playing a video again, then rebooted into windows and got a crash while trying to install League of Legends.

    When booting Linux back up, the system log says a Machine Check Exception - i.e. a hardware error - was detected. The only useful-looking line says, "Generic CACHE Level-2 Generic Error" which I believe indicates the CPU detected that the L2 Cache has become corrupted. Under Windows I got BSODs both times - the code however was not MACHINE_CHECK_EXCEPTION but, the first time, IRQL_NOT_LESS_OR_EQUAL and the second time DRIVER_IRQL_NOT_LESS_OR_EQUAL.

    I am running in AHCI mode rather than RAID (this was necessary to dual boot Linux. To get Windows working I had to reboot it into safe mode to get it to enable the AHCI driver.) Apparently on older XPS models this caused BSODs, but with a different (more specific) error code - so I doubt it's that. It's worth noting that before Monday I was able to install Linux, a bunch of other software and watch a different 720p video without anything going wrong - I am fairly sure in this time I did not install anything which could affect this (for example, graphics drivers or OS updates.) This could be good luck, could be me forgetting something, or could mean something's broken internally.

    I'm getting somewhat high temperatures while doing this. In Linux the system log sometimes has other machine check exceptions about the processor being throttled due to high heat. I see the cores get 85 degrees using the Linux temperature sensor tools, with the CPU overclocking to 2.9GHz, but when I checked temps until a crash, it crashed while the temperature had dipped slightly, and was sitting around 78. In Windows temperatures got a bit higher (or maybe the software is just different - one never knows I guess) measuring up to 90 degrees in RealTemp. These seem a bit hot for having been at load for, I guess, 15 minutes, but I don't know that for sure. At any rate, such a temperature alone sure doesn't explain a crash.

    If anyone has any advice that'd be nice. Although I don't think the standard stuff like "update drivers" and so on is going to be of any use since the same problem (it seems) occurs in two different OSs. I'm already on the currently-latest 1.0.7 BIOS. If anyone can say if they have the same problem or perform similar tests (install League of Legends from scratch in Windows, play video in Linux, or just any kind of load testing really - I will try in prime95 later) that'd be useful. Also if you could check temperatures under load to see if mine are unusual that'd be handy. I'm expecting I'll have to send it back though. If I do, does anyone know that if they replace the motherboard (or whole unit) they swap out the SSD? Setting up Fedora has been quite some effort so it would be nice to know if I'll have to do a full back up to avoid it again. I just hope if it comes to that that they don't make the coil whine worse!
     
  2. John Ratsey

    John Ratsey Moderately inquisitive Super Moderator

    Reputations:
    7,197
    Messages:
    28,841
    Likes Received:
    2,170
    Trophy Points:
    581
    Do Dell's own diagnostics and the logs in the BIOS reveal anything?

    If not, leave Memtest86 running overnight to give the RAM a thorough testing (much more thorough than the diagnostics). A couple of years ago I had some erratic BSODs and Memtest86 revealed a bad RAM module.

    John
     
  3. pressing

    pressing Notebook Deity

    Reputations:
    404
    Messages:
    1,985
    Likes Received:
    1,034
    Trophy Points:
    181
    Maybe take a read through the 9350 threads a bit as that is a (very) similar machine to your 9360.

    The 9550 (similar but with 15" screen) threads show that the BIOS and some drivers took Dell more than six months to get generally sorted. As far as I am concerned, it was a disaster out of the gate. Even today, some with 4k screens are running older BIOS & drivers based on which "features" they need vs. "problems" they can live with. That said, once Dell got the BIOS and drivers generally sorted, I think it runs quite well.

    The 9550 really starts to throttle around 80*C so there is a marginally useful datapoint. There were issues with lousy thermal paste application from the factory.

    User Eason has a nice guide on undervolting - check that out on his signature. Note that Intel XTU has some bugs that CAUSE power throttling on some new Dell machines. Perhaps ThrottleStop is an option but not sure it works with Kaby Lake.

    FYI - I dropped thermals on the similar 9550 some 17*C with ThrottleStop, CPU-GPU repaste, and replacement of VRAM thermal pads. Those are typical results for other users...

    ThrottleStop can enable Intel SpeedStep which might help with thermals as well. Dell hasn't bothered to enable that nice feature. I have a thread here that discusses some of the basics of SpeedStep
     
  4. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    It could happen if the CPU is overheating. repaste before doing anything drastic like letting a Dell butcher loose on it.
     
  5. kent1146

    kent1146 Notebook Prophet

    Reputations:
    2,354
    Messages:
    4,449
    Likes Received:
    476
    Trophy Points:
    151
    Honestly, I say to just go through Dell Support. This is clearly a hardware error of some kind; either overheating caused by a bad thermal paste job (likely), bad memory (somewhat likely), a motherboard defect (somewhat likely), or a CPU hardware defect (very unlikely). This advice especially applies if you paid for next-day onsite repair. It's just not worth your time fiddling around with diagnosing & repairing the problem, when you already paid someone to do it for you.

    When you call Dell Tech Support, you'll need to jump through the hoops. You won't get much luck bypassing all of that by saying, "Look, I know what I'm doing; this is a hardware defect. Just put me through to Tier 83 tech support, and give me a new motherboard." My best advice is to just play dumb, and play along. They might have you run through some hardware diagnostics, sure. But when they get to the part when they tell you to do a data-destructive System Restore, just tell them that was the first thing you tried (since you went online to read help articles); and it didn't help.

    If you have next-day onsite service, then the technician can tell you whether s/he was asked to swap the SSD. If no, then no problem. If yes, then you might be able to convince them by saying that you're pretty good with computers, and have done storage hardware diagnostics, and know it's not the SSD.

    if you're mailing your machine into a Dell service center, then definitely capture a drive image of your SSD. It's highly likely that the service center will do a System Restore to run diagnostics against a known-good software configuration (the factory image), as the only reliable way it can validate that the problem was fixed.
     
    Kikuri likes this.
  6. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Hi thanks for the quick replies. When I get home I will check Dell's diagnostics (do you mean the thing you get here?), BIOS logs (I didn't realise there were any...), memtest86 and ThrottleStop. I did look for some 9350 threads and didn't find anything that looked that relevant. Mostly stuff to do with AHCI mode or stuff fixed by BIOS updates.

    I didn't get the on-site repair so I'll investigate repasting. It would require a little investment since I don't actually have any thermal paste or torx screwdrivers(!) However, can I get a bit more info one why you suggest that bad cooling would cause crashes? My impression is that the throttling was good enough nowadays that CPUs basically never crashed or damaged themselves due to overheating. (Anecdote: a couple of years ago my desktop was routinely throttling itself while gaming and all that happened was my framerate would tank - I'd carry on playing for an hour with no other effects!) Maybe this is overly optimistic but it would be useful to know more - in particular, people discussing other XPS laptops *seem* to be talking about improving performance rather than fixing crashes.

    The advice for dealing with Tech Support is very much appreciated. To save space I intend to take a file-level backup rather than a full image (and because I know how to do it reliably - I feel dd'ing the drive or using cloning software is a bit fragile especially when fiddling with drive modes, UEFI/GPT (technologies I don't know), multiple OSes and so on.)

    I'll check in later - thanks again.
     
  7. kent1146

    kent1146 Notebook Prophet

    Reputations:
    2,354
    Messages:
    4,449
    Likes Received:
    476
    Trophy Points:
    151
    Just as an FYI... repasting your CPU would most likely give Dell valid grounds for denying a motherboard replacement request under warranty. In the US, a vendor can deny a warranty request if they can show that actions / modifications performed by the consumer caused the damage being claimed. And it's not a far stretch to claim that a customer screwing around with the CPU; and messages about random low-level hardware failures and L2 cache failure messages are related.

    Now, you're not likely to get caught. A lot of things need to go wrong for that to happen. But if you're sending a laptop in to a service center for a job that will require them to remove a heatsink and deal with re-applying thermal paste at some point in the repair process; it wouldn't be too hard for a keen-eyed repair tech to notice that the existing thermal paste is a different color than what they normally see; report it up to their supervisor; and then who knows what happens from there.
     
    Kikuri likes this.
  8. pressing

    pressing Notebook Deity

    Reputations:
    404
    Messages:
    1,985
    Likes Received:
    1,034
    Trophy Points:
    181
  9. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    Dell encourage CPU repasting as other customers have mentioned this in the XPS 15 threads prior to a service call with suspected heat issues, they also put the service manuals up for you to download and disassemble. They also have no problem sending me out motherboards, keyboards (full strip down) SSD's, palm rests (also a total strip down)

    Makes a refreshing change as most other companies would call foul and revoke the warranty. This is also why Dell have no warranty void stickers.
     
  10. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Thanks for the comments on repasting.

    OK so: I checked the BIOS Logs (nowt), ran memtest for one test (nowt), and ran the Dell diagnostics. This found a problem with the memory - seems weird given that it tests for a fraction of the time of even one memtest, but maybe they know something I don't? What do you think?

    I got a new BSOD error code today: KMODE_EXCEPTION_NOT_HANDLED in NTFS.sys. This was again under load, although after quite a while at load and not at its hottest. I will re-run memtest overnight to see what it turns up. Dell will send me a new memory module, I guess for me to install here, under warranty, which sounds like it might just be a no-brainer. I'd just need to get the right screwdrivers and be away. I could apply a sensible quantity of thermal paste at the same time in case that's the issue.

    Thoughts?

    EDIT: Well I assumed since the page was offering to send out memory that the memory wasn't soldered on... everything I read online though suggests it is, so uh... wat? :S

    EDIT2: OK memtest86+ ran overnight for 8 hours without any errors. However before that I re-ran Dell Diagnostics to check the exact error and confirm they are running a memtest-like test. Prime95 also fails the large FFT test, but the small FFT test at least passes two tests. (The large one fails the first test.) I'm going to see what happens if I undervolt in Intel XTU and retry but I'm not too hopeful. This also isn't a solution because XTU doesn't run in Linux as far as I know.
     
    Last edited: Nov 17, 2016
  11. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Update: undervolting by 50mV (actually now I'm not sure of the precise units... it was ten times the minimum step Intel XTU permitted, anyway) reduced temperature quite a bit, but the system still crashed while running Prime95 in RAM-stress mode. I wasn't watching at the time so I don't know how long it took but it wasn't that long. In any case I thought the next step would be to contact Dell support. I think the website must have been wrong in saying they'd send me a new RAM module, but I'd like to see what they do offer me. If they just send out a new mainboard then that might still be a pretty good solution. The only thing is that replacing that will be more effort and complication than swapping out some RAM sticks.

    I still have the option of repasting without getting anything replaced, but I am still under the impression that if the CPU is being kept under its maximum temperature, the system should be perfectly stable. Perhaps the RAM itself is overheating, but then that's a separate issue not solved by repasting the CPU.
     
  12. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    Just call Dell now.
     
  13. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Sorry, I didn't make it clear, I already contacted them, but am waiting for a reply. (I can't really call them while at work so it's via email for now.)
     
  14. pressing

    pressing Notebook Deity

    Reputations:
    404
    Messages:
    1,985
    Likes Received:
    1,034
    Trophy Points:
    181
    Fish- please see my response to you on the previous page - Intel XTU has some defects that cause newer XPS systems to throttle. Thus, your XTU testing does not provide much useful information unfortunately. You can try ThrottleStop or some other program if you like...

     
  15. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Well if voltage, temperature and clockspeed are all reduced, yet the system still crashes, that is telling us something, right?
     
    pressing likes this.
  16. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    You would hope so

    Sent from my SM-G920F
     
    pressing likes this.
  17. pressing

    pressing Notebook Deity

    Reputations:
    404
    Messages:
    1,985
    Likes Received:
    1,034
    Trophy Points:
    181
    That your system has some problems, unfortunately...
     
  18. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Exactly ;)

    Dell didn't reply in the promised 1 working day. I'll ring them on Monday if they still haven't replied by email. Supposing they decide that the RAM is to blame, as seems reasonable, are they likely to send out a new motherboard for me to install, or will they get me to send it to them? If the former, is there some risk to me if I screw up the installation? The process doesn't seem too risky but there's always a slight danger of the screwdriver slipping and whacking something sensitive or whatever, which obviously increases the more involved the task.
     
  19. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    You will know what will happen next when you speak to them.

    Sent from my SM-G920F
     
  20. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Yes, obviously. I was trying to get an idea of what to expect.
     
  21. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    Without repeating myself, it is different depending on the country and the level of service you paid for. NBD onsite and expect one the day after you phone otherwise you will probably send it in. BUT you may talk them round to sending parts so you will find out when you phone, simple as that.
     
  22. Fish-Face

    Fish-Face Newbie

    Reputations:
    0
    Messages:
    9
    Likes Received:
    1
    Trophy Points:
    6
    Motherboard replaced today (and repasted) - instability seems to be fixed, but the temperature seems to be the same. I'm less worried about the temperature, of course.

    EDIT: The coil whine on the new motherboard is godawful on battery in Linux...
     
    Last edited: Nov 22, 2016
  23. Ben564654564

    Ben564654564 Newbie

    Reputations:
    0
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    5
    Got the same issue. Memory is fault on my 9360. Prime95 found errors and Dell diagnostic software too.
    But Dell is unable to repair the unit because there are no spare parts....This situation lasts for weeks....
     
  24. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    Be firm, ask for a manager, ask the manager for the resolutions team.
    Demand the unit is replaced with a new one.
     
    Kikuri likes this.
  25. Ben564654564

    Ben564654564 Newbie

    Reputations:
    0
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    5
    Already asked and talked to a manager. They refuse to replace/refund. But I'm dealing with the French plateform which is not really performing well. I'm now telling them that I'm going to sue them if they do not take action.
    Worse experience!!! I will NEVER buy a Dell anymore.
     
  26. GoNz0

    GoNz0 Notebook Virtuoso

    Reputations:
    259
    Messages:
    3,947
    Likes Received:
    1,378
    Trophy Points:
    231
    Yeah you really need the resolutions team for this one m8, hope you paid with a credit card, I would be starting a chargeback about now.
    Are you within your 1st 14 days? they have to refund in that timeframe.
     
    Last edited: Feb 9, 2017