Help with complicated BSOD

Discussion in 'Desktop Hardware' started by Drew92983, Jul 17, 2019.

  1. Drew92983

    Drew92983 Notebook Enthusiast

    Reputations:
    1
    Messages:
    17
    Likes Received:
    0
    Trophy Points:
    5
    Posted to Tom's Hardware as well, just trying to pool as much experience as possible.

    Hey Guys. Been having a really difficult time narrowing down a problem with my rig as I can’t seem to find any consistency and hoping someone might have some ideas. This is a pretty long post, so please bear with me.

    Here is my build as it currently stands:

    Motherboard: Gigabyte Z370 Aorus Gaming 7 Bios F7

    Cpu: Intel 8700k not overclocked

    Ram: 16gb Corsair Vengeance DDR4 3000mhz (2x8gb) currently @ jedec default 2133mhz

    Gpu: 8gb Nvidia 1070 Founders Edition not overclocked outputting to 3 of 4 displays

    2gb Nvidia Gtx 760 Secondary used as Physx and outputting to 1 of 4 displays

    Sound: Sounblaster Recon3D Pcie

    Storage: 1- Samsung 970 EVO Nvme SSD

    5 Mechanical Hdd’s used as storage and mechanical redundancy


    I rebuild and overhauled the system back a year ago and upgraded to the Gaming 7 board, Intel 8700k, and at the time 16gb (2x8gb) Corsair Vengeance @ 2666mhz set to XMP and a 500gb Samsung SSD connected via Sata. The CPU was never overclocked, the RAM is only OC’d per the XMP, and the GTX 1070 was overclock +125 core/+400 mem. The system ran fine for 10 months with no issues or problems.

    In Feb I decided to upgrade my storage to a Samsung 970 EVO Nvme. This would free a Sata slot for another HDD to give me some mechanical backup to my data. I also applied the new BIOS F13 to make sure I had the best compatibility for the Nvme.

    During the fresh install of Windows I got a BSOD “Page Fault in Non Paged Area, reason Win32kbase.sys” during a restart removing some the bloatware for my printer (HP deskjet 1660) after installing the driver. I figured no big deal it never BSOD’S so I moved on, ran a SFC and CHKDSK just to be safe and imaged the install when done.

    Over the next 4 months the system would randomly (1-3 of every 20 starts) BSOD during windows sign-in with the same BSOD Page Fault in Non Paged area pointing to Win32kbase.sys (80%) or Win32kfull.sys (20%). The dump would never create, it would just sit @ 0% even though my settings for the dump file were set correctly. If the system made it past sign in without crashing, it would operate normally with no issues. I could browse the web on Chrome, watch movies, do hours of heavy gaming with 100% load on my 1070 and high CPU/MEM usage and never get a BSOD. I tried removing the GPU overclock in MSI Afterburner. Ran several SFC’S, CHKDSK’S, and DISM (all returned no errors or corruption). But kept getting the BSOD.
    At this point I figured I had some sort of corruption in the windows install even after passing SFC and CHKDSK, so decided to do a fresh install, and this is where the problems mounted. Any “()” below is what I was thinking, and the BSOD is always page fault in non paged area.

    During the first reinstall got the BSOD after a restart doing a windows update for .net and my sound card. (Maybe my sound card is bad?)

    Reinstalled windows again to see if it would duplicate and got a BSOD after restart installing the first driver which was the chipset (Ok, maybe not my sound card?).

    Reinstalling windows again and got a BSOD formatting the SSD partitions in windows setup. (Is there something for the Nvme I am forgetting?).

    Took the tower apart and cleaned all contacts. Made sure I had good connections and swapped to the other memory DIMMS. Restarted and installed windows. Made it through installing all the drivers. After reading that this BSOD usually means a problem with memory I did a windows memory diagnostic, and it was good no errors. Didn’t have a lot of free time to extensively test the ram, so I picked up a new kit @ Best buy. And with the only change to the system being the Nvme drive I figured I must be missing something as well. Did some reading and found out there is a driver provided by Samsung for the drive.

    Installed the new RAM, set the XMP and reinstalled windows again. After the chipset, installed the Samsung Nvme driver. Got a BSOD two restarts later after installing the intel RST driver.

    Did some more reading and extracted the driver to install during windows setup, loaded setup, formatted the drive, did a clean using DISKPART, then installed the Nvme driver. Continued and installed windows. Got a BSOD several restarts later. At this point began to think there might be something with the Nvme (even though SMART checked good), the MB slot, or possibly even my power supply (as it was 13yrs old). Atleast this time it created the dump file and pointed to Ntos knl 0x50.

    Purchased a new Powersupply (850 watt EVGA G3), and a new 970 EVO. Installed the new power supply and 970 EVO, but this time changed to the 3rd M.2 slot. Was using the 2nd as my 1st one knocks out two of my Sata ports. Also restored all BIOS settings to default, left the RAM at non XMP 2133mhz, and disconnected EVERYTHING except both graphics cards, sound card, nvme, keyboard and mouse.

    Reinstalled windows and got BSOD “memory management” on setup finalization of “getting devices ready”. (Ok, Maybe it really is my sound card as it’s the only attached device?)

    Was forced to reinstall windows since the installer crashed and made it all the way through drivers, and windows updates, except BSOD this time after restart installing the MB apps for RGB Fusion (to control the LED’s) and SIV (to setup fan profiles).

    Reinstalled windows without the MB apps but BSOD on restart installing the printer driver vs uninstalling unnecessary printer bloatware. (Can’t be the printer software as other crashes were prior to the printer being installed, or connected?).

    At this point I was trying to reverse any other changes made from when it was stable. So I reverted back to the BIOS prior to my Nvme which is F7.

    While doing more reading on the BSOD crashes, especially the 0x50 Ntos krnl related to any overclocking, I decided to look at the voltages in BIOS for the memory. The VCCIO voltage @ default “auto” non-xmp 2133mhz was @ 0.946v, and the System agent @ 1.05v. I was reading that the two voltages should be 0.05v apart. So I bumped the VCCIO voltage to 1.0v. Since then I have restarted at least 15 times, run prime95 (custom test using 15/16gb ram), and did a windows memory diagnostic, all came back clean.


    So at this point I am not sure if it’s the BIOS being reverted back, me bumping up the VCCIO voltage to 1.0, or just in the lull between BSOD’s. I have sat at the computer for 15 mins continually restarting and not getting a crash. The only constant to my issues is the a BSOD will only happen while signing into windows. Windows always boots to sign-in with no issues, and once past the sign-in screen works just fine. Anyone have any advice as to what to do next if I do get another BSOD? If I don’t get another one I would like to get the memory back to XMP @ 3000mhz, but if the instability was from voltage what should I be looking for after setting the XMP or manual setup of timings?
    Thanks for reading such a long post, and thank you to anyone who offers to help.
     
  2. rlk

    rlk Notebook Evangelist

    Reputations:
    92
    Messages:
    451
    Likes Received:
    223
    Trophy Points:
    56
    The first thing to do is to download and run memtest86 (https://www.memtest86.com/download.htm -- grab the free version). You'll need to burn it onto a USB drive and boot it, since it runs as its own OS. Run a full pass or two, and see if it shows any errors. If not, then you can try running at the XMP profile; if you see errors there, then obviously the RAM won't tolerate those conditions, and if you want, you can try to find settings that will work.

    If all of that comes back clean, you can then reboot Windows and run some kind of stress tests. I'm not familiar enough with Windows to know what you might do for that. I think prime95 targets the CPU rather than memory or I/O.
     
    Mr. Fox likes this.
  3. Mr. Fox

    Mr. Fox Undefiled BGA-Hating Elitist

    Reputations:
    27,574
    Messages:
    34,564
    Likes Received:
    54,214
    Trophy Points:
    931
    Have you tested without the 2GB Nvidia GTX 760 secondary GPU installed to see if anything changes? If not, try that... especially if you are using Windoze OS X. NVIDIA may be doing their notorious GPU genocide crap with Kepler. They stopped caring about Kepler performance and stability when they released Maxwell, and we're way past them caring after Pascal and Turing.

    Even though you are not overclocking the memory, the errors you are experiencing sounds like a memory problem to me. It can also be voltage for the CPU, VCCIO and/or memory needs to be increased. As a general rule, undervolting anything other than the CPU core is not advised. In most cases, leaving the VCCIO, SA and other voltages on "auto" is best unless you are pushing a hefty overclock, in which case you will need to look at increasing them. Some DDR4 RAM sticks run more stable stock or with XMP profiles using 1.350V versus the 1.200V default.

    Try what @rlk suggested with memtest86 and see if it errors out. If it does, try setting the RAM voltage to 1.350V and re-run the tests to see if it still errors out. You could have a bad stick of RAM, in which case no changes in settings are going to fix it.
     
    Last edited: Jul 28, 2019
    Papusan likes this.
  4. StormJumper

    StormJumper Notebook Virtuoso

    Reputations:
    537
    Messages:
    3,282
    Likes Received:
    419
    Trophy Points:
    151
    What you outa really do is Fresh install Windows O/S with motherboard drivers and let that run to see if any problem happens then you will know where to start looking before install other hardware and software and if your not doing this any help on here will do nothing to help.
     
Loading...

Share This Page