New computer, but experiencing nvlddmkm.sys VIDEO_DXGKRNL_FATAL_ERROR (code 141)?

Discussion in 'Sager and Clevo' started by Amnvex, Sep 3, 2019.

  1. bennyg

    bennyg Notebook Virtuoso

    Reputations:
    1,511
    Messages:
    2,319
    Likes Received:
    2,292
    Trophy Points:
    181
    A negative core offset is more like an overvolt than an underclock, but what it actually does to the card depends on the situation with the load and the power limit. It can either force a lower clock at the same voltage (under power limit, which is under furmark) or the same clock at a higher voltage. If the card is experiencing instability due to transient voltage drops this *may* provide extra stability, but the gpu vrm could also be faulty and it'll have no effect either way.

    Conversely, a +ve core offset mostly acts like an undervolt - allowing the card to boost to a higher Mhz or at a lower voltage or a bit of both - should induce more crashing more often by eating into the stability tolerance zone.

    Locking the card to a specific voltage/frequency (ctrl+L in the afterburner curve editor window) may be helpful to test stability. It doesn't override power limits, which is what the mobile cards spend almost all their time under, the core will still drop clocks, left along the boost curve.

    But any problem that comes on only after multiple hours, and can't be specifically induced, is a giant pain in the backside to troubleshoot. Hopefully you have warranty and a service line to call to help you with it as it does sound like a hardware issue.

    As for clocks under furmark, I'm not seeing anything other than normal behaviour, the core runs the fastest clock possible under the power limit. Furmark is a "heavier" and less variable load than anything else the core will ever run, so it is operating constantly under power limit condition, and stabilises at a lower overall clock (and lower voltage tied to that clock) than it would during a game load.
     
    Last edited: Sep 3, 2019
  2. Amnvex

    Amnvex Notebook Enthusiast

    Reputations:
    0
    Messages:
    28
    Likes Received:
    2
    Trophy Points:
    6
    Interesting. So is a negative core offset a good thing or a bad thing? Because it sounds like a bad thing and that I shouldn't mess with it. If this is the case, I think I better just turn off the scaling that is offered in the BIOS. I guess I'm also not the only one. I read on other forums that RTX owners have the same issue and they've RMA'd their cards twice and the issue persists... I get this issue, too:
    https://us.forums.blizzard.com/en/overwatch/t/render-device-lost-fix-for-rtx/263106/472

    Maybe really a BIOS problem. I will need to explore it. The game I played, btw, is Vampyr. I play it on medium settings and at 1440x900 resolution). I even tried playing Victor Vran as I said, and that is played in windowed mode (I think 1280x720 resolution out of the capable 1080x1920 desktop resolution). There's no way a game like that which is also on med-high settings that came out like 4-5 years ago should be causing the GPU to overwork itself and be "lost"...and it used to happen immediately when I opened the map in the game as I mentioned. There was a time that I thought Nvidia Experience app was the problem. After uninstalling it, the map crash stopped. I reinstalled drivers only. Now the problem only happens after playing the game for a longer time (2+ hours, generally). Map no longer crashes on load. Baffling, I'd say that Nvidia Experience was responsible for it. I've since reinstalled the Experience software. No issues with map crashing. How can it be this capricious?

    Edit 1:
    I've disabled performance scaling in BIOS. Let's see what happens... I'm willing to try anything at this point, lol, but will test later (going to work now).

    Lastly:
    After rebooting, I get this when trying to access the NVidia Control Center:
    upload_2019-9-3_20-22-59.png

    But then when I right clicked again, to test if I'd get the same issue, the Nvidia Control Center started with no errors! So confusing.

    Edit 2:
    With Furmark, these are the results are ~3 mins (with GPU Scaling OFF in BIOS). Seems more "stable" with the clocks (no OC offset is used here).
    upload_2019-9-3_20-29-51.png

    Edit 3:
    Apparently this also happened, but I had not noticed it (I think it happened when I tried to close it, so it decided to crash instead... just guessing):
    ntdll as I understand is a kernel-level driver? The mystery deepens...
     
    Last edited: Sep 3, 2019
  3. joluke

    joluke Notebook Deity

    Reputations:
    175
    Messages:
    914
    Likes Received:
    420
    Trophy Points:
    76
    What version of BIOS and EC do you have?

    Reboot your laptop and intermitently press F2 to enter BIOS and you will see said info in the primary screen that pops up
     
  4. Amnvex

    Amnvex Notebook Enthusiast

    Reputations:
    0
    Messages:
    28
    Likes Received:
    2
    Trophy Points:
    6
    Thanks for the reply.

    I hope the numbers are the same since I short-cutted this by going into MSINFO32 to get this info: BIOS is INSYDE CORP. 1.07.03P dated 1/15/2019 and EC version is 7.04.
     
    Last edited: Sep 4, 2019
  5. joluke

    joluke Notebook Deity

    Reputations:
    175
    Messages:
    914
    Likes Received:
    420
    Trophy Points:
    76
    Well your BIOS is a bit old!

    the latest one for your model is: BIOS Version 1.07.09

    And the latest EC: 1.07.08

    can you ask the store that sold you the laptop to send ya an update of both EC and BIOS? Worth a shot :)

    (I got the info from clevo e-channel directly)

    you got a mirror here for Clevo BIOS and EC:

    https://repo.palkeo.com/clevo-mirror/P9xxEx/

    But it isn't updated with the latest updated BIOS/EC from Clevo and clevo's ftp for downloading BIOS/EC has been down today (like always lol)

    Edit:

    was able to grab the latest EC 1.07.08:

    https://mega.nz/#F!TBZhGYJA!tuzyRICl5OSs1oPoulkywg <- This is a link for a folder. If i can grab the latest BIOS from Clevo's ftp i will post it there too. For now only has the EC for your model
     
    Last edited: Sep 4, 2019
    Amnvex likes this.
  6. Amnvex

    Amnvex Notebook Enthusiast

    Reputations:
    0
    Messages:
    28
    Likes Received:
    2
    Trophy Points:
    6
    I did ask them, actually. The manufacturer, a combination of Pro-Star and Sager, has gone through a restructuring and has not sent them anything despite having waited 3 weeks for an answer. The reason, they said, is that they have to email the company in China, then the China company has to email the U.S. office, and then they have to email to my seller. But that hasn't happened. And I doubt it will it will at this point. I have given up on trying, but I have requested once again for them to re-request the BIOS and EC with instructions.

    And I have no idea how to flash the BIOS properly on a laptop like this. Especially an EC, something I don't recall ever having to mess with on a desktop computer from ~2005. I've done it before, but it was on a DELL desktop many years ago and I'm afraid something may break. Windows 10 1903 already doesn't like this computer with the BSODs that it has given me post-update. 1807 or w/e the version was before this didn't give me this many issues.

    ALSO, I think I should say that no BIOS updates are shown for my laptop model: https://www.clevo.com.tw/en/e-services/download/ftpOut.asp?Lmodel=P9xxEx&ltype=1&submit=+GO+

    Idk why. I guess they think it doesn't need an update.
     
    Last edited: Sep 4, 2019
  7. Meaker@Sager

    Meaker@Sager Company Representative

    Reputations:
    8,113
    Messages:
    51,320
    Likes Received:
    14,640
    Trophy Points:
    931
    Might be worth taking an image backup of your drive and re-installing with the base driver set and seeing if the machine is doing the same thing.
     
  8. Amnvex

    Amnvex Notebook Enthusiast

    Reputations:
    0
    Messages:
    28
    Likes Received:
    2
    Trophy Points:
    6
    I can tell you that the answer is certainly NO (i.e., it wouldn't be doing the same thing as it is doing now). If I install the OS without updates to either Windows or drivers, everything works fine. But who wants to be run an unupdated version 1809? I don't know what the problem is. When I ran nvidia inspector first time, I had no issues. Once Windows started updating, OCing the GPU started with errors that pointed to something to the effect of "illegal address modification" or something. In other words, the shortcuts that I could make from nvidia inspector ended up giving memory errors after windows updated. Not sure when (after which update, that is) it started happening. I can also say I started getting ACPI errors after the updates. These ACPI errors made the Clevo Command Center completely useless and the Event Viewer would show the EC returning values when none were requested. None of CCC profiles for the fans would work. The hotkeys for the keyboard would also not work. Nothing to do with the CCC worked in general--the system was fighting it. That was the most annoying because the fan control (automatic fan control) wouldn't work at all so the laptop would just end up hanging up in shutdown stage (with the screen blank) and stay that way for the whole night, overheating itself, essentially. It was HOT. Real hot. I'm guessing TDP limitations kicked in and prevented the CPU from doing anything when it was stuck on the shutdown sequence.

    It wasn't after I uninstalled everything, reinstalled Windows, and didn't do driver updates from Clevo's FTP site that all errors resolved themselves (generally, except now there are GPU problems that seem to somehow be a result of Windows itself). That was also the same time that I requested a BIOS update because why else would there be ACPI errors? I'd get hkmoufltr BSODs, ntoskrnl.exe BSODs, driver verifier BSOD loop (twice, and all on Microsoft drivers), storage data corruption BSODs, etc. It was a nightmare... I was ready to throw the laptop out the window because I thought it was all hardware related originally. How can so many things go wrong in first month of a brand new laptop's life? Impossible. Right?

    Anyway, I thank you all for your help thus far. I've been interested in testing out many things and you've given me ideas on what I can try (helps with the brainstorming). You're much more knowledgeable on this stuff than I am. I really do appreciate it! At some point, I think, a solution will be found. Just a matter of when, I guess.

    Edit 1: I've emailed support and they're reluctant to help me get the BIOS. They've so far said that they want me to reinstall windows again, without keeping files or anything else. They said this may fix it if it's windows-caused. -__- and they said don't update anything except Windows.
    Best advice ever... /sarcasm

    Edit 2: Erased everything as suggested by you and the support people. Time to re-setup the laptop. Will need a couple days to test out.
     
    Last edited: Sep 4, 2019
  9. Amnvex

    Amnvex Notebook Enthusiast

    Reputations:
    0
    Messages:
    28
    Likes Received:
    2
    Trophy Points:
    6
    Ok, update:

    It seems I fixed it. I used to get intermittent sensor reports and blanking out of values as seen previously in my posts, but now everything seems to be stable and reporting as it is supposed to. I can say that this is definitely a Windows problem! What worked for me was this: go into safe mode, uninstall nvidia drivers, reinstall them, then install intel management engine components. This is the order it *must* be done in or it won't work. I know this because there was a sequence in a game that I tried to get through but it'd give me TDR errors and crash. That sequence is no longer a problem and there is no crash anymore! All because of this...

    I also followed these steps (removed Windows, reinstalled but didn't allow auto-updates to anything--more importantly the GPU because it appears that Windows is dropping in corrupted nvidia files), changed TDR (just in case) values in registry, and installed GPU drivers straight from Nvidia's site (latest--even newer than what GeForce Experience offers) without installing any extra software (like GFE or audio drivers). Then I let Windows do updates, but not any that involve drivers of hardware (e.g., realtek audio). I had to rollback the realtek driver and not let it update through device manager (my mistake) because it was a messed up driver from Windows itself! (Just proves that Windows has issues with updating hardware stuff and should NOT be used!). BTW, this is where I got the info on how to fix it: https://www.nvidia.com/en-us/geforc...isplay-driver-nvlddmkm-stopped/?commentPage=2
    upload_2019-9-7_18-32-42.png

    Nothing intermittent: solid reporting. No patchy info anymore and no flickering of temp reporting or any of the other boxes!
    upload_2019-9-7_18-22-16.png

    Typical values during gaming (this is Vampyr):
    So, I guess that's it.
     
    Last edited: Sep 14, 2019 at 3:51 AM
Loading...

Share This Page