Broken GTX 980M

Discussion in 'Hardware Components and Aftermarket Upgrades' started by Darker01, Nov 11, 2017.

  1. Darker01

    Darker01 Notebook Enthusiast

    Reputations:
    5
    Messages:
    17
    Likes Received:
    3
    Trophy Points:
    6
    Hello everyone.

    I made this thread with the hope that I would learn more about what caused my GTX 980M to fail after ~2 years of moderate/heavy gaming and academic use. Pictures can be found here.
    As far as I can tell, the front and back of the card looked normal. R47 and R22 had some small deformations on the surface that were a bit hard to see. I believe R22 is some sort of VRM/inductor, but I have no clue what R47 is. The power supply's tip was noticed to be discolored after the failure, but I'm not certain why it happened or how it was related to the failure.

    EDIT 1: R47 and R22 in the pictures had a resistance of about 1.5 Ohms as measured with the red Centech digital multimeter - same as the adjacent clean-looking R22.

    EDIT 2: Added picture of backplate with thermal pad to the album.

    EDIT 3: @Khenglish suggested measuring resistance across power input pins. If >2kOhm then something else other than the power FET broke. Measured 7 Ohms. Picture here.

    EDIT 4: Measured the resistances of the black capacitor between the R22 coils (core) and the 2 adjacent to the one on the right (memory). Got 9.5 Ohms for the row of 6 and 24.0 Ohms for the row of 2. Remeasured resistance across the power input pins and got 9.5 Ohms. Possible correlation between resistances of core capacitors and the short.

    EDIT 5: Row of 6 capacitor shorted to the power input pad (right one). Requested this from Texas Instrument (free with .edu email extensions) along with hot air rework station + flux + solder.

    TL;DR: My GTX 980M failed despite my best effort to keep my P750ZM running cool. I'm not sure if the thing shorted, overheated, had an on-board temperature sensor failure, failed as a result of something else failing, or simply just reached the end of its life. No component was OCed.

    Background: the 8 Gb GTX 980M card with copper backplate (removed to take pictures) came with a used Clevo P750ZM I bought around October 2015. The laptop was originally purchased from RJTech, and the previous owner used it mainly for software development. Other than coil whine at high FPS, the machine was fine for the most part with VSync enabled. GPU temperature was checked with HWinfo64 and MSI Afterburner every so often, and I had never seen it being above 80*C. The laptop itself was always cooled by a home-made cooling pad with 4 120 mm ~1800RPM case fans installed, and I normally had internal fans on max speed whenever I play something demanding. Vents and fans were cleaned every 3-5 months, and the GPU vent wasn't blocked when the card failed. In short, I think I took care of the laptop decently given how much it cost me when my salary was $0. I got a few IRQL_not_less_or_equal BSODs related to the touch pad driver here and there, but that was about it for unusual behaviors. I didn't know about ThrottleStop before the failure, so the CPU was running at stock voltage in case anyone thinks power supply issue was involved. I bought the laptop hoping it would last 5 years or more, so I avoided OCing any component.

    On average, I gamed 1-2 hours each day for the first ~1.5 years and 4+ for a couple of months leading up to the failure. The laptop itself was kept on for about ~6-12 hours daily. For games like DOOM, BF1, GTA V, and Witcher 3, I lowered the texture and lightning settings to ensure that the GPU temperature was decent at 60 FPS and above classic Runescape graphic. Extraneous settings like bloom, blur, and AA were turned off entirely. Even with those precautions taken to control the thermal behavior, the failure occurred when I was walking around looting things in Witcher 3 ~2-3 hours into the session.

    The Failure:
    screen turned black. MSI Afterburner overlay was active at the time, but I didn't check it before the crash. The laptop turned off mid-game without any sign such as freezing/distorted audio. Gameplay was smooth for the most part, and there wasn't any cue (i.e. freezing, audio distortion, micro-stuttering, MSI Overlay readout etc...) to suspect the CPU or the GPU was running too hot. I attempted to turn on the laptop, and the power supply (Chicony 230W) started clicking at the same time that its indicator light flickered. Both the battery and the power indicator LEDs remained amber while the laptop was plugged in. Pressing the power button resulted in the power indicator briefly turning green and back to amber again (along with the clicking in the power supply). The fans did not turn on. Holding down the power button long enough and the power supply's indicator light turned off completely, and no more clicking sound can be heard. Re-plugging the power supply turned the indicator light on again, and the same scenario repeated when the power button is pressed while the laptop is plugged in. It was noticed that the area around the right speaker (which is directly above the exhaust for the GPU fan), the power supply tip, and the power supply were all really hot to the touch. I measured the temperature with an infrared thermometer, and I had readings at around 45-50 degree Celsius for those regions ~5-10 minutes after the failure. Discoloration on the tip was noticed then.

    I then removed the bottom panel and checked for anything unusual. Everything visible looked fine (i.e. no exploded component/charred regions). The power adapter on the motherboard showed no sign of shorting/melting. There wasn't any "burnt-plastic" or solder odor. RAM sticks all looked fine. Battery wasn't hot. I tried holding down the power button with the battery removed for over half a minute before plugging the power supply in, both with and without the battery, but the clicking persisted. NVRAM reset didn't work. The failure occurred at midnight on a Saturday, so I decided to let the laptop sitting unplugged without battery and check it again early on Sunday morning. Problem persisted, so I sent RJTech and RMA request which they promptly granted on Monday.

    The Aftermath: I had suspicions that the graphic card might be the cause of the failure to POST, but I decided that it is best to send the laptop to RJTech for them to evaluate the extent of the damage. I figured that I wouldn't be able to do much even if I removed the heatsink to check for the damage, and I was busy with work for the most part to buy a MXM card and do the diagnostic myself. Upon receiving the laptop, technical support noticed that there was some unidentified liquid on the VRAM chips (which I believed to be thermal pad oil) and sent me the 2 pictures that circled the affected chips. I asked them to check if the motherboard was still functional with a new GTX 980M installed, and after some stress testing they confirmed that other components survived. I confirmed with later testing that the power supply was still functional (enough to sustain the CPU under heavy load at least) although I never tried to push it to the +200W regime. While I'm grateful that RJTech accommodated my request for additional testing with a functional card, I decided to get the broken laptop back. I was uncertain about the reliability of my P750ZM at the time, so both getting a new card to restore the P750ZM to pre-failure performance and getting the broken card refurbished by Clevo were out of the question for me.

    Now: I found a surprisingly cheap Sager NP9870-S originally from Xotic-PC up for sale on Craigslist of all places. The thing has 980M in SLI, so at least I am comforted by the fact that if 1 card failed, there's always another one inside. It is also nice to know that in the event of simultaneous 2x card failure, I can always build a PE4C eGPU setup like what @bloodhawk did with his P870DM here. Call me paranoid but I already got a PE4C v4.1 and power supply just in case. I also upgraded my cooling pad with 4 of these. As for the P750ZM, I grabbed a GTX 765M and brought it back to life with that. Installed ThrottleStop on both machines and spent awhile to lower the voltages as low as possible.

    I still wanted to know what exactly went wrong with the GTX 980M that failed to hopefully prevent future failures. I've been looking around to see if anyone else posted something similar regarding their MXM graphic card causing power supply to short itself while still leaving other components unharmed. There's no schematic for the card floating around, so I hope people with intimate knowledge of the board would be able to help. I'll gladly provide close up pictures to the best of my ability.
     
    Last edited: Nov 15, 2017
  2. Arrrrbol

    Arrrrbol Notebook Consultant

    Reputations:
    75
    Messages:
    163
    Likes Received:
    115
    Trophy Points:
    56
    Is that the original power supply that came with the laptop?
     
  3. Darker01

    Darker01 Notebook Enthusiast

    Reputations:
    5
    Messages:
    17
    Likes Received:
    3
    Trophy Points:
    6
    Yes. Never bothered to check how much power was drawn from the wall. I do now.
     
  4. Arrrrbol

    Arrrrbol Notebook Consultant

    Reputations:
    75
    Messages:
    163
    Likes Received:
    115
    Trophy Points:
    56
    Hard to tell what caused the problem, but hopefully its just the GPU failing and nothing else. That liquid you can see on the VRAM is probably just oil from the thermal pads though.
     
  5. Darker01

    Darker01 Notebook Enthusiast

    Reputations:
    5
    Messages:
    17
    Likes Received:
    3
    Trophy Points:
    6
    Other components were fine for the most part. I ran a couple of wPrime benchmarks on the 4790K while undervolting it without encountering anything unexpected.
    The pads I believed to be from Fujipoly given how it looked.
     
    Vasudev likes this.
  6. Dr. AMK

    Dr. AMK Notebook Evangelist

    Reputations:
    477
    Messages:
    501
    Likes Received:
    959
    Trophy Points:
    106
  7. Mobius 1

    Mobius 1 what is quality control?

    Reputations:
    2,723
    Messages:
    7,795
    Likes Received:
    4,945
    Trophy Points:
    431
  8. Danishblunt

    Danishblunt Notebook Deity

    Reputations:
    81
    Messages:
    1,037
    Likes Received:
    375
    Trophy Points:
    101
    I think you already have a very good idea what went wrong, also the card might still be alive. As you already noticed the inductors are fine only some damage on the surface which really don't mean anything, but the thing that "killed" it was the vram shorting. Your power supply acted like a classic power supply that refused to power on a shorted system. If you clean the card with isopropylalcohol and replace some vram chips (you can buy them from ebay for around 3USD each) then ur card is back on track. If you're really really lucky, then an isobath alone might even "fix" the card.

    I think you realize yourself, that this is very likely caused by the cheap thermal pads you were using. So you might want to consider buying high quality ones in the future (grizzly minus for instance).
     
  9. Darker01

    Darker01 Notebook Enthusiast

    Reputations:
    5
    Messages:
    17
    Likes Received:
    3
    Trophy Points:
    6
    I thought it was the oily substance initially too, but then I dismissed that idea for the past couple of months thinking that the silicone oil shouldn't be conductive. After reading your comment and some more searching on the forums, I think I was wrong to assume that the silicone oil wasn't contaminated. OP of this thread here had oily substance on the VRAM of their GTX 580M along with a blown cap. Other people pointed out that the oil might be contaminated, and OP later stated that they lived in a highly humid area. In my case, I think the pad themselves might have broken down significantly after 2 years, and stuffs leached into the oil. Although I don't have those pads anymore, I recalled that they were particularly spongy and yielded easily when stretched. Lesson learned. I hope there's a sale for Fujipoly thermal pads on Amazon this up coming Cyber Monday.

    I'll look around in my local area (SoCal) to see if there's any computer repair shop offering ultrasonic cleaning service. Given how the card looked, I think you may be correct that the majority of the card itself was intact still.
     
  10. Danishblunt

    Danishblunt Notebook Deity

    Reputations:
    81
    Messages:
    1,037
    Likes Received:
    375
    Trophy Points:
    101
    I fixed a GTX 980M not to long ago, it was shorted on 1 vRAM chip, I reballed it and tried it out, worked again. So yes, you might be lucky.

    Every substance can be more or less conductive, for instance destilled water has very bad conductivity while saltwater is way more conductive, so taking into account that the oil soaked up dirt and other substances it's not that unlikely to cause some issues really, so it doesn't really matter wheter or not the oil itself is conductive.
     
Loading...

Share This Page