Temperature issues on Gigabyte gtx 1080 G1


#1

Hey guys! It’s me once again!

I’ve been seeing a weird anomaly on one of my nvidia gtx’s and it consists in a very sudden temperature rise from 34 C° to 70° (when I start the miner) in less than 15 seconds. It eventually leads to a miner failure after reaching 75 C° in a matter of minutes.

I proceeded to detach the heat sink with its coolers from the board in question and also its back plate. I found the original thermal paste a bit dry so I cleaned everything (board and heat sink) with propyl alcohol and reapplied new thermal paste to both the core and the heat sink area where the core makes contact with. I used Artic Silver 5 which proved to me to be one of the best ones around many times .

So I don’t know if it was because I was running out of paste (I may not have applied enough of it) or what but… the issue still persists and now I’m testing the video card running at a power limit of 80%. That prevents it from going beyond 68 and failing…

All of my other video cards are successfully running at 100% power limit with no problems at all and one of them is working on the riser the failing one was so it’s not a riser or psu cable problem ( which I considered at first).

Last but not least, some of the old thermal paste got kind of stuck to the core and propyl won’t get the job done, it’s a very thin layer but can’t get to remove it…still, it’s on a very small area of the core I just can’t believe that’s the reason why the temperature makes such a jump from 34 to beyond its regular working temp in such a short time.

Thanks a lot for any suggestions or ideas you may have guys!! :slight_smile:


#2

Have you tried underclocking it a little bit and see how that does? Do you have a temp limit set? What are you using to tweak your power usage and clock speeds?

I usually run my GPUs at 70-80% power.

Someone a little more experienced can probably speak better to this than I can.


#3

Hey!
Yeah, actually I’m currently using the one that has the temperature issue at 80% but the others are at 100% and exactly at the same temperature, so there’s definetely something wrong with this one.


#4

Consider returning it or exchanging it if its still under warranty. You can also try running a GPU benchmark stress test.


#5

75C shouldn’t cause a thermal shutdown and I wouldn’t worry about how quickly it rises. At most you might see some thermal throttling not a shutdown. Nvidia by default doesn’t start to throttle until 92C. I think something is wrong with that card.

Again do not worry about how quickly the core temp rises. This is completely normal especially when mining as you are going from 0 to 80-100% in a split second. Its not like you have a large thermal mass that heats up over time. You are pushing 200W (probably more at 100% power) of power through a very small die in a very short period.

Actually its 94C before it thermal throttles.


#6

Hey Nekko! =)

What concerns me is that this wasn’t the case before, it started happening some time ago. Now I under clocked it to 80 % like @heavilyarmedclown suggested and everything seems to be right. But if I set it back to 100 % temps will definitely surpass the other two video cards working at the same temps but higher power output…

Long story short…this is nothing some thermal cleanse and new thermal paste nor risers change will fix… I guess I will have to use it at 80% from now on


#7

Check your FAN SPEED!

After a windows update I noticed that my fan speeds were reset and my GPUs were quickly approaching 70C. I had to manually set them back to their higher speed and the problem was fixed. So check your fan speed for that card.


#8

Hey MoneyMan!

I did check them!
Anyway, I have all the fans configured on MSI Afterburner and all those three Gtx 1080 share the same configs… Still, I think the three cards have a different air output for some reason, but that might be an electronic level thing I don’t understand as all the coolers on all video cards are clean with nothing slowing them down (such as dirt in their rotating axis, etc)


#9

I fully understand. There is something wrong with that card. It could just be the temp sensor went bad on it and it is reading incorrectly and causing the card to act differently.


#10

I was just wondering that…I’d put it back to work at 100% like the other cards and manually read its temps…the problem is I have a crappy probe sensor on a multi-meter and I what I need is a laser one to accurately read them
Or touch their back plate and see which one burns my fingers the worst, although that’d be going full cave man mode :rofl:


#11

You could compare that to the other cards but actually know the temp those laser temp sensors are crap they are just as bad as IR camera’s. They are only useful in comparing two items not actually getting a temperature. The reason for this is they are using IR which is effected by the emissivity of the material. For example if you have a piece of metal and piece of plastic that are exactly the same temperature the metal is going to feel cooler than the plastic even though they are the same temp.


#12

Thanks a lot for that info!! I really thought they worked alright… Good to know :slight_smile:


#13

They do to an extent when comparing like for like. So if you measure the exact same spot on each of the cards with the same settings you should see similar numbers. This will be a bit overkill below but hopefully it helps to show why even if they are the same you may get different results.

The picture below was taken with an IR camera which is using the same tech as the laser temp guns they are measuring IR not actual heat. I use an IR camera in the data center to see airflow patterns not to get actual temps. There is a temp cross hair but that temp is actually incorrect as it is not calibrated for the emissivity.

Each of these cabinets draws roughly the same amount of air and at the same power levels, but because of the environment in the room you will get different amounts of airflow across the different servers. Same goes for your mining rig on a much smaller scale. As a side not we put essentially a blanking panel under the cabinets to stop the hot air from wrapping underneath the cabinets.

image

I am not trying to dissuade you from using one of the guns but want you to understand their short comings. On a GPU even a quarter of an inch off with the laser gun will yield drastically different results.


#14

Thanks again for yet another great explanation!

I understand what you mean. In order to read the exact temperature I’d have to hit the naked core of the video card and even then any breeze surrounding it might as well change the reading result. It could be useful as a “reference” but not as a direct value.

The most accurate sensor is the one in the video card then, so if it’s not properly working then I can “attempt” to read the temps externally but won’t be exactly the same.

By the way, how I wish I could work in the hardware department to learn things like the ones you know…servers are not located in the delivery center I work though.


#15

LOL I spend most of my time telling the software and engineering teams that we only have so much power and cooling available. Just find something you enjoy every job has its green and barren patches. Most of what I have learned is through my hobbies that just happen to spill over into work which is what sets me apart from those that just do it as a job.


#16

Same here!

Building computers and dealing with hardware is what I love in the IT world (I also enjoy taking on hardware tasks outside IT), alternative cooling and optimization of resources. But the job I got is on the software side.

I work as a programmer which is a pending assignment I have as all of my life I dealt with and learnt hardware, programming is the other half I oblige myself to learn in order to understand computers as a whole. But I just love hardware :smile:

Those are wise words you just said there, I wish I could accomplish that soon and whether get to move to Hardware dpt or just get another job where I can apply all I know and above everything: Keep on learning.