Comments Locked

143 Comments

Back to Article

  • dullard - Tuesday, June 9, 2020 - link

    Are you saying that thermal fatigue has been solved? That will be the first that I have heard of it.

    Processors that sit unused or sit idle (old CPUs in old motherboards) at/near room temperature are not the problem. Neither are processors that work 24/7 (number crunching) at a stable elevated temperature. Both use cases have very little thermal fatigue. But processors that turn on/off/on/off with extended periodic use expand and contract repeatedly, eventually lead to thermal fatigue and stress cracks. The more power used, the greater the thermal stress in the transition from off to on. Do you have data that shows that thermal fatigue is nothing and that all CPU damage is electromigration?
  • hehatemeXX - Tuesday, June 9, 2020 - link

    The CPU isn't turning off/on here. And no, you'd have to turn off/on tens of thousands of times to even see stress cracks.
  • dullard - Tuesday, June 9, 2020 - link

    Of course this issue isn't turning the CPU on/off. But many use cases do. In fact, I'd say the majority of use cases at home or in business are short bursts of activity which are going from idle to turbo and back in short amounts of time. The questions are (1) how often does the burst occur (not often for the majority of use cases). And (2) how much more likely will a stress crack form due to the additional power. That last question is not answered here.
  • Fataliity - Tuesday, June 9, 2020 - link

    A normal cpu goes from idle to load over and over and over every single day. The only difference is, its now getting 30 Watts extra because the mobo maker changed a setting. That 30W isn't going to rapidly change the way a processor heats up, just maybe make the max temperature slightly higher.

    Use your brain.
  • dullard - Wednesday, June 10, 2020 - link

    My brain and past studies on CPUs show that the more wattage the more stress damage can occur. This is because the more power the faster the temperature changes are. That is why I am asking if this is no longer the case. And if so, why.

    Temperature isn't the problem in stress fatigue. The problem (in the past) was how fast that temperature was reached.
  • PeterCollier - Thursday, June 11, 2020 - link

    30 Joules/s does not even begin to affect dT/dt.

    Use your brain.
  • Stuka87 - Thursday, June 11, 2020 - link

    If what your saying is real, we would have seen this happening ages ago back when processors actually used a ton of power. I ran a Phenom II that pulled 250W when overclocked for years and it was fine going back and forth between idle and full load. Modern processors use significantly less power than processors from 10 years ago. I have never heard of a processor failing because of heat induced stress cracks.
  • El_Rizzo - Friday, June 26, 2020 - link

    Larger geometries, tho. Plus a fin geometry is actually pretty prone to fracture cracking.
    Still not going to be a problem, since given that fatigue is a purely mechanical problem in nature is governed by peak termals only (both positive and negative).
  • Fataliity - Tuesday, June 9, 2020 - link

    And this has nothing to do with stress fatigue. This is talking about a CPU boosting slightly higher than its "original" specifications may have intended. From say 142W to 170 or 190W. The algorithm also only boosts higher if you are under a certain temperature threshold.
  • eastcoast_pete - Tuesday, June 9, 2020 - link

    The thing that'll go first with many repeated hot/cold cycles is the thermal paste between the heat shield and the heat sink. If that cracks, the CPU will misbehave, not due to direct damage, but to insufficient heat removal.
  • naris - Wednesday, June 10, 2020 - link

    Paste can't crack
  • marc1000 - Wednesday, June 10, 2020 - link

    yes, thermal paste can crack, and it does. after some years, it will dry out and become less and less efficient as the contact surface gets worse due to the paste shrinking or cracking.

    my computer is a bit old, and had the same paste on the cooler for well over 7 years. recently I decided to re-seat the CPU cooler and after removing it, I noticed the past had "cracked".

    with new paste my max temps went down 5º C. even if some decrease is due to re-tensioning or some of the cleaning, 5º is a considerable reduction just from re-applying thermal paste.
  • Spunjji - Wednesday, June 10, 2020 - link

    Slightly higher max power draw is going to have - at most - a negligible effect on thermal stress. This article really isn't about that, so not sure why you opened with that straw man.
  • GNUminex_l_cowsay - Wednesday, June 10, 2020 - link

    Do you have an example of thermal cycling fatigue being a problem with the chip itself? All of the examples of thermal cycling fatigue I'm aware of caused damage to solder joints not the chip itself.
  • jjjag - Saturday, June 20, 2020 - link

    Wow you have NO IDEA what you are talking about. Then again neither does the writer of the article since he is not an engineer nor a silicon/transistor expert. EM is a function of CURRENT and temperature, it's right there in the Wiki article that he stole from. Those are a function of frequency and voltage. EM is also cumulative and non-reparable. So the longer you run a cpu at overclocked frequencies the quicker it will be to fail due to EM. Plain and simple.
  • dfstar - Tuesday, November 17, 2020 - link

    This is the one correct comment here as far as I can tell:-)
    EM is not primary failure mechanism unless you count infant mortality where a wire is partially damaged due to defect, silicon failure mechanisms that a average user should worry about are due to oxide layer breakdown and these are generally assumed to be accelerated by voltage and temperature but not current. I wish the author had spent a little more time researching and explaining semiconductor reliability since it is both useful and interesting
  • Achaios - Tuesday, June 9, 2020 - link

    It's unforgettable for me, but the first post on Anandtech I read was your review on the Core 2 QX 9650 Extreme CPU which you accidentally fried on testing OC due to excessive VTT (Termination Voltage) over 1.35V.

    The second Anandtech post I read was Anand's review on WoW Original. :-)
  • ThereSheGoes - Tuesday, June 9, 2020 - link

    If you actually read through the article and cut through the assumptions, there is no quantifiable proof in this article that says this doesn't impact longevity in an overly-adverse manner. AMD doesn't spec the processors to do this, full stop, for a reason. Also, lying to the SMU effectively circumvents these unspecified 'protections' against electromigration. In effect, this article is opinion and conjecture wrapped in some technical bits to make it appear authoritative.
  • Spazturtle - Tuesday, June 9, 2020 - link

    "there is no quantifiable proof in this article that says this doesn't impact longevity in an overly-adverse manner. "

    It is impossible to prove a negative, the burden of proof it on the person making the positive claim, and I have yet to see evidence that this does harm CPUs.

    "Also, lying to the SMU effectively circumvents these unspecified 'protections' against electromigration."

    How do you know this? If you don't know what protections these CPUs have then how can you say that lying to the SMU defeats them?
  • ThereSheGoes - Tuesday, June 9, 2020 - link

    Strawman argument - won't work on me :P The statement that this will not have a meaningful impact on processor longevity is wholly unbacked by any proof, period. This is opinion wrapped in technical descriptions that are meant to impress those who don't know better. Good background, sure, but none of it supports in any meaningful way that this isn't impacting longevity significantly. Surely AMD has a reason for assign specifications, no?
  • Atari2600 - Tuesday, June 9, 2020 - link

    Again, as stated by Spazturtle... you cannot prove a negative.
  • ThereSheGoes - Tuesday, June 9, 2020 - link

    Again, this article shouldn't claim this will not have a significant impact when the only information available is that motherboard vendors are running the chips out of AMD's pre-defined specifications.
  • Atari2600 - Tuesday, June 9, 2020 - link

    There are known physical properties of silicon processes across a long period of time.

    Therefore you can say "X is less than Y", i.e. the chances of silicon migration breaking your 7nm CPU is a lot lower than a 45nm CPU due to A, B & C.

    You can't prove that it won't break it. But you can definitely say chances are reduced and you can postulate it shouldn't.
  • Fataliity - Tuesday, June 9, 2020 - link

    There are millions of these chips out there. If it was such a risk, there would be a bunch of people complaining about ruined processors don't ya think?

    Apparently no one can think today. Everyone just wants to yell about something they didn't know or care about, don't understand, and haven't even taken the time to see if it could be an issue. Go do some research if you think this is such a big issue.
  • willis936 - Tuesday, June 9, 2020 - link

    Yeah. The average user is going to complain about the error rate increasing from 10^-21 to 10^-18, right?

    Just because people don’t complain about it doesn’t mean it doesn’t accelerate aging and push the useable life down to a matter of years.
  • CiccioB - Wednesday, June 10, 2020 - link

    There are millions of these chips out there, but have not been there for 2,3 or 4 years to see if this a problem or not. In 3 years, when supposedly we'll have a cemetery of died Ryzen, what would you go to say?
    Oh, I was wrong, sorry.. just buy the new ones which are better?
  • willis936 - Wednesday, June 10, 2020 - link

    @CiccioB Have you considered shutting up until you can form an actual argument?
  • Spunjji - Wednesday, June 10, 2020 - link

    I really enjoyed the friendly fire from willis936 onto CicciB here as both try to spray FUD around.

    "accelerate ageing" is an extremely relative term. We don't have the data to declare this *definitely* isn't a problem, but the data we do have suggests it is *extremely unlikely to be one*. If you want to argue otherwise, you're going to need more data that supports your claims, instead of just running around flapping your arms yelling "think of the chips".

    I'm tired of people selectively demanding 100% certainty for anything they've decided they want to disagree with.
  • Thanny - Friday, June 12, 2020 - link

    You mean you didn't understand any of the technical details presented in the article, and therefore couldn't follow the obvious deduction that there's no real risk of damage.
  • close - Tuesday, June 9, 2020 - link

    "you cannot prove a negative" well you can sometimes but I get your point. In this case perhaps the tone should be rather "we are reasonably confident this will not be a problem".

    Also Ian's statement that "We are still very happily able to test old CPUs in old motherboards" is pretty meaningless in the context of the article, not to say misleading. The fact that a 65nm CPU that has seen a week's worth of testing (and maybe even a bit of torture) over the past 15 years says absolutely nothing about the reliability of a CPU that could be used every day for years and is built on a 2 year old 7nm process.

    The long proof presented in the article is not related to the conclusion. And this kind of "it must be safe because we didn't yet prove it's unsafe" sounds like the argument you'd use to give a vape pen to a baby.

    In all honesty I also think that in any reasonable scenario the CPUs should be safe for a long time. But that's only because I'm fairly confident AMD (and TSMC for that matter) must have investigated this aspect since this would have serious consequences for them going ahead. Certainly not because since I can still feed 3.5V VCore on my old K5, one should feel confident doing it on their Ryzen CPU too.
  • eek2121 - Tuesday, June 9, 2020 - link

    There are folks that have had significant degradation from using too high of vcore in an overclock. One person had his 3900X degrade from a 1.35V vcore after just 4 days. Once he returned it to stock it would not boost properly anymore.

    I suspect there is quite a bit more to this story than we realize.
  • lioncat55 - Tuesday, June 9, 2020 - link

    1.35V vcore is crazy high for long term use.
  • Stanis - Wednesday, June 10, 2020 - link

    Nowadays probably, it's considered safe for 24/7 OC for Haswell. I have to say, I pushed my i7-4770K to [email protected] for a few benchmarks, but I'm running it @stock (MCE enabled tho) with some SC2 heavy sessions [email protected]. And that's with 2400mhz mem. Still, it's enough for 1080p60 gaming and Photoshop retouch.
  • close - Tuesday, June 9, 2020 - link

    @eek2121: That's why I said "reasonable scenario", mostly referring to someone buying the CPU and using it "as is", not for OC. Not saying OC is intrinsically unreasonable but it certainly can be.

    My point is exactly that based on the arguments like "we have old CPUs that we use once in a blue moon and they still work" the conclusion certainly can't be "go for it, all good". Again, I have no data either way, just the "reasonable" confidence that the CPU should be good for years unless you (or your MB) go overboard with OC.

    This being said the fact that a CPU comes with 3 year warranty is no consolation if it dies in the fourth. I still regularly use machines with CPUs over a decade old. I have an i7 6700 going on 5 years and I don't expect I would want to throw a Ryzen 3900 away after less than that. The expected lifetime of a product is far longer than its warranty.
  • willis936 - Tuesday, June 9, 2020 - link

    On your last sentence: only if you’re new to OC. Every system I’ve ever owned has burned through OC headroom on the order of months to years. Just because the people who professionally use a chip for a week don’t talk about it or even know it exists, doesn’t mean it isn’t a well known issue. It’s even more of an issue now with tiny OOB OC headroom and OOB boosting.
  • close - Wednesday, June 10, 2020 - link

    @willis936: on AT, with its comment system straight from the late '90s, replying with "your" without mentioning a name is pretty confusing.

    The point is whether your CPU can die due to the boost MoBos give it now. Losing OC (above boost) ability while retaining the OOB functionality and clocks is a perfectly acceptable outcome for most. Anyway any OC that damages your CPU in months (including here inability to boost frequencies anymore) or even kills it is definitely NOT reasonable. And the kick these motherboards give to the CPU is "likely" not that kind of OC.

    This is why OC usually voids warranty. No matter what CPU you use there will be a point beyond which you do irreversible damage regardless of experience. Keep in mind that OC is not an exact science. 15-20 years ago I used to burn countless CPUs, GPUs, RAM, or MoBos via modding and extreme OC and no 2 units behaved identically. And the ones that didn't die were definitely "never the same again".

    But again, the point is if current OOB boosting will kill your CPU, not if pushing the OC envelope will lower the boost ceiling.
  • Spunjji - Wednesday, June 10, 2020 - link

    @willis936 Surely it's even less of an issue with Ryzen, given that there's basically no point in overclocking? Don't OC, don't worry about the non-existent headroom that requires silly voltages to access - just sit back and relax.
  • schujj07 - Tuesday, June 9, 2020 - link

    Even if this is an opinion, Ian's opinion carries far more weight than your looking for "proof." Ian has his PHD in Electrical Engineering which means his opinion is based on years of study and work. He knows more about how this will affect users than probably everyone on this forum combined. In the end I will reiterate what has already been said, "you cannot prove a negative."
  • CiccioB - Tuesday, June 9, 2020 - link

    Despite the "ipse dixit", he had not brought a real proof, nor a theory, of the fact that electromigration is not a problem on nowadays devices built with modern tiny PP.
    He just referred to these hypothetical "built in mechanisms to deal with it" which he never described nor he made a link for a reference .

    He just said it is not a problem. But we have proof that it had recently on Intel chipsets.
  • ThereSheGoes - Tuesday, June 9, 2020 - link

    iirc, he has a degree in chemistry, not electrical engineering.
  • close - Wednesday, June 10, 2020 - link

    @schujj07: I have my PhD in Electrical Engineering and worked in the field. I have been doing extreme OC since the late '90s (not for some years though). The only reason I don't do it for a living was that getting free trips to Taiwan didn't pay my bills.

    I discussed an argument, not a person, as in any civilized discussion. The argument that old, seldom used CPUs don't exhibit an issue that mainly affects new manufacturing processes and regulr/high use is a bad argument and you don't need a PhD to see this.

    And *again*, I'm not asking anyone to prove a negative. But the conclusion *cannot* be that the adverse must be true. You cannot prove I am not a supernatural all-knowing being as such I must be one?

    And to be pedantic, you *can* prove a negative, it's just that it's not a generally useful tool outside specialized fields, like math. Look at impossibility theorems. I also have a PhD in mathematics so we can discuss this at length if you want.

    To sum up: the evidence as presented in the article doesn't support either conclusion too strongly. Some of the evidence doesn't support anything related to the topic to be honest. The only reason to assume one conclusion is more plausible than the other is *trust*, mainly in the engineers designing the CPU. I agree with Ian's conclusion but I do not agree with his level of confidence in it (too high), or his arguments that got him to the conclusion.
  • lmcd - Tuesday, June 9, 2020 - link

    Ian's facts clearly didn't work on you so a strawman argument was a good second attempt.
  • close - Wednesday, June 10, 2020 - link

    @lmcd: Stop kissing ass. I literally referenced the arguments I found inappropriate, and what my problem was with the conclusion. Yours in the meanwhile is devoid of any useful information or commentary, nothing on topic. And a voice with nothing constructive to say is just noise.

    Ian is a grown up, if he sees fit to defend his argumentation or conclusion let him. You're clearly overwhelmed by the discussion and can't add anything to it.
  • gagegfg - Tuesday, June 9, 2020 - link

    @ThereSheGoes: Conversely, there is no proof, just opinions like yours, that this amperage and voltage deviation kills a modern CPU.
  • CiccioB - Wednesday, June 10, 2020 - link

    The fact is not that "it kills". It is that it ha probability to wear it faster than expected.
    And you'll know only in 2 or 4 years. Not tomorrow.
  • Spunjji - Wednesday, June 10, 2020 - link

    At which point nobody will care about your random comments on a forum. Soooo... what's the point of you making this unprovable claim?
  • CiccioB - Wednesday, June 10, 2020 - link

    You have problem to understand, I see.
    You cannot say that this motherboard OC is not going to have any effect in long term usage.
    So my "random" comments that make AMD fanboy like you angry have more clue than your pathetic defense.
  • Spunjji - Monday, June 15, 2020 - link

    I do love it when somebody clearly not understanding what is going on accuses someone else of not understanding what is going on.

    *You have no basis for your claim*. Not the claim about the CPU, nor the claim about me being a fanboy. That's it. End of discussion.
  • ironargonaut - Wednesday, June 10, 2020 - link

    What NEGATIVE! The claim is your CPU will be just fine at last to warranty without issue. Where is the negative? AMD guarantee's this if you are within their spec. Research how they do this. This can also be done in this case to see if the claim is true or false w/o waiting. This article has offered no proof this has been done for the out of spec conditions. Data talks all others walk.
  • PeterCollier - Thursday, June 11, 2020 - link

    It's certainly possible to prove a negative. The 8th grade logic class memes aren't interesting.
  • xenol - Tuesday, June 9, 2020 - link

    Do you have quantifiable proof that says otherwise? And do you have proof that lying to the SMU circumvents protections that you claim are "unspecified"?

    Feel free to bring your own write up to the table for scrutiny.
  • CiccioB - Tuesday, June 9, 2020 - link

    Not having proof to sustain that the fact is a problem does not automatically make the opposite true as well.
    Here there are not proof nor enough technical elements to sustain that raising the voltage for a long time will not shorten the CPU life, especially when we know AMD took some work on latest BIOS to limit the voltage for a supposedly electromigration problem that may be the Achille's heel of TSMC latest PP.
  • brantron - Tuesday, June 9, 2020 - link

    The article is about a boost feature that was tested and found to raise peak voltage by 0.01v.

    If that is "raising the voltage for a long time" and "not proof nor enough technical elements" in your book, then you're on the wrong site.
  • CiccioB - Tuesday, June 9, 2020 - link

    It also raises the base voltage. And make the chip go hotter.
    The two terms that increase eletromigration. So stating that this is not a problem with any technical proof but saying that old overclocked CPU used once in a year still work may be somehow misleading. It also said that the SI atom are the one to be removed by electromigration, while in reality are the atoms of the wire, be it Aluminium or Copper, making the process an avalanche one.

    I'm not saying it is a problem. Just that I can't see in this article anything that disproof the question on longevity and two lines that state:
    elettromigration = voltage x e^ temperature
    with this trick that increases both is a more valid argument in support that there is some problem rather than not.
  • Spunjji - Wednesday, June 10, 2020 - link

    "we know AMD took some work on latest BIOS to limit the voltage for a supposedly electromigration problem that may be the Achille's heel of TSMC latest PP"

    Uhhhh no. We know they changed some voltages, and at the time it was *rumoured* to be due to electromigration. AMD subsequently refuted that rumour. The last part of that sentence is pure conjecture. We don't have to empirically refute conjecture, because nothing empirical supports it.
  • Lord of the Bored - Wednesday, June 10, 2020 - link

    They aren't raising the voltage. They're (indirectly) raising the current.
  • Spunjji - Wednesday, June 10, 2020 - link

    In effect, your comment is opinion and conjecture wrapped in some technical bits to make it appear authoritative.

    FTFY
  • GeoffreyA - Tuesday, June 9, 2020 - link

    Nice article. I don't know if I'm just stupid or what, but on my system (Tomahawk + 2200G), Hwinfo doesn't show any "Power Reporting Deviation" metric. Is there a way to enable it, or does that work only on Ryzen 3000 upwards? I'm using 6.27.4185. Thank you.
  • Qasar - Tuesday, June 9, 2020 - link

    i was wondering the same thing. Power Reporting Deviation doesnt seem to show up on hwinfo when i run it as well.
  • K_Space - Wednesday, June 10, 2020 - link

    @Qasar if you clicked on CPU, you won't find it there. Instead go to Sensors and scroll down. You'll find Power Deviation there.
  • Qasar - Wednesday, June 10, 2020 - link

    yep.. found where it should be via a screen shot some one posted, but on mine, thay metric, doesnt seem to be there
  • Qasar - Wednesday, June 10, 2020 - link

    just checked again. seems it should be around where it shows the infinity fabric clock, memory controller clock and 3 entries for thermal throttling, and Power Reporting Deviation is not there.
  • GeoffreyA - Thursday, June 11, 2020 - link

    Same story here. Just tried the newer build, 4190, but still no luck. Only IF, limits, and thermal throttling. Perhaps a newer version will fix it eventually.
  • magila - Tuesday, June 9, 2020 - link

    Electromigration is very much an issue with Zen 2 CPUs. If you search overclocking forums you can find lots of reports from people who have degraded their CPUs even with only moderate overclocking. TSMC's N7 process is much less robust than Intel's 14 nm when it comes to high current draw.
  • andrewaggb - Tuesday, June 9, 2020 - link

    This is something I think the article fails to address adequately. New processes don't have the same characteristics, tolerances, aging expectations, etc as older ones and so the past isn't necessarily a good indicator of what will happen on the N7 process. I agree it's unlikely to damage your processor or meaningfully reduce it's lifespan but I don't know that for sure and I'm unqualified to make such an assertion.
  • AnarchoPrimitiv - Wednesday, June 10, 2020 - link

    You have a ridiculous sense of entitlement, why should this article, obviously intended for the non-overclocking user to anyone with common sense, do needless effort checking into the self-inflicted overclocking damage on CPUs with respect to their manufacturing process? The main point if this article is about whether this automatic motherboard feature does damage, so why are you criticizing it for not doing intensive research on a tangent topic that does nothing for the intended readership? Because you're lazy and want someone to do some research for you?
  • AnarchoPrimitiv - Wednesday, June 10, 2020 - link

    The term "overclock" cound encompass infinite different voltages, thermals, etc and therefore cannot be realistically addressed as it's too wide a variable, that's why it voids warranties. This article is not addressing people who overclock, because overclocking is inherently accepting an acknowledged risk that you're doing damage.....

    .... Youre completely ridiculous for having the expectation, that this article, directed at the non-overclocking user, should somehow research the self-inflicted damage caused by overclock ears for the purpose of what?
  • CiccioB - Wednesday, June 10, 2020 - link

    You may have been missed that the motherboard trick overclocks your processor even if you do not want it to to that and you are not aware of it.
    So, yes, the term overclocking applies here, because if my CPU that is stated to consume 142W for a brief period of time when necessary starts sucking 190W constantly it may be a problem. Also for not overclocking prone users.
  • brantron - Thursday, June 11, 2020 - link

    You may have missed that it raised power 25 watts...in Cinebench, not "constantly."

    You act as if this is some sort of kill switch for everything modern CPUs do to control power and heat, when it's actually what makes this gimmick possible.
  • CiccioB - Thursday, June 11, 2020 - link

    You may have missed that this test is just a sample of many other cases. So using the numbers of this single motherboard test to argue that a concealed OC that cause "just" 25W more power consumption is harmless is quite stupid.
  • brantron - Thursday, June 11, 2020 - link

    I'm sure we'll be hearing all about Ryzen CPUs auto overclocking beyond 5 GHz and spewing flames soon enough. It's Y2K all over again!
  • CiccioB - Friday, June 12, 2020 - link

    This sarcasm is useless.
    It is more probable then in 3 or 4 years you'll have lots of dead Ryzen which were installed on some motherboards.
    But as the problem is delayed in time, who care, right?
  • Thanny - Friday, June 12, 2020 - link

    It doesn't overclock the processor. It clocks it higher within spec, because it believes there's more power budget left than there is.

    The only consequence is more heat output than intended, which becomes irrelevant if you have a good cooler, but possibly not irrelevant if you're using the stock cooler.

    Put another way, this motherboard cheating increases the TDP. It does not overclock.
  • magila - Wednesday, June 10, 2020 - link

    I was addressing the part of the article that makes it sound like electromigration is a non-issue with modern fabrication processes. In particular, "Electromigration has not been an issue for most consumer semiconductor products for a substantial time." is only really true if you ignore non-Intel processes.
  • Spunjji - Wednesday, June 10, 2020 - link

    Ryzen CPUs already operate at the upper edge of their curves. There's really no such thing as "moderate overclocking" with one of these processors, and it's an almost entirely pointless activity.

    There's only one range of CPUs available on N7, so nobody has enough information to conclusively say that any issues they have are down to N7 itself and not down to the design of the CPU. Weird how people feel so certain about something they cannot possibly know for sure.
  • casteve - Tuesday, June 9, 2020 - link

    https://en.wikipedia.org/wiki/Black%27s_equation">Black's Equation should still apply and you are reducing the CPU's MTTF with the increase in operating temperature and current density. The big question is always what did the mfgr design to for MTTF and at what operating conditions? 10 year? 20yr? While increasing the current (and current density) is a linear decrease in MTTF, increasing the sustained temp 25C might decrease the MTTF an order of magnitude.
  • porina - Tuesday, June 9, 2020 - link

    3700X on Asus Prime X370-Pro bios 5220. Prime95 8x128k FFT stabilised around 91%. Cinebench R20 varied between 92-94%
    3600 on Asrock B450 Gaming-ITX/ac bios P3.70. Prime95 6x128k FFT 88%. Cinebench R20 started around 91% and crept up to 94% during the run. Repeatable on 2 runs.
  • AnarchoPrimitiv - Wednesday, June 10, 2020 - link

    Wow, 2 runs.... That's such a big data set you obviously blew the lid off this subject... Are you a researcher or something or an investigative journalist because your five minutes of work really adds to the discussion....
  • Awful - Wednesday, June 10, 2020 - link

    The article specifically asked for people to post their results...and two runs is enough to show how the motherboard is behaving; there isn't a need for more data. What a bizarrely hostile comment...
  • CiccioB - Wednesday, June 10, 2020 - link

    It is clear you are here to deny the problem on its root, not to discuss if it may create some harm.
  • Spunjji - Monday, June 15, 2020 - link

    It's clear CiccioB came here to spread FUD, because there is no factual basis to assume it "may create some harm".
  • eggnogg - Tuesday, June 9, 2020 - link

    "The peak voltage, which matters a lot for electromigration, only moves from 1.41 volts to 1.42 volts"

    Does this mean it's better that I run llc for short voltage peaks with low current instead of leaving it at intel guidance with a higher voltage offset at high current draw to gain OC stability?
  • plopke - Tuesday, June 9, 2020 - link

    So 1700 stock , no tweaking in bios on a gigabyte board under 100% load says for me
    95.X%
  • eek2121 - Tuesday, June 9, 2020 - link

    95% is typically fine.
  • plopke - Tuesday, June 9, 2020 - link

    lowest i have been able to go was with a few runs of OCCT 93.7%
    cinebench R20 went to 96.3%
  • eek2121 - Tuesday, June 9, 2020 - link

    The Gigabyte x570 Aorus Elite gets 80% under Cinebench. I want to try Prime95 later.
  • Slash3 - Tuesday, June 9, 2020 - link

    3950X on an ASRock X570 Taichi, with processor running at defaults in BIOS (v2.73). HWInfo64 reports a power deviation metric of 100.3% under multi-thread Cinebench R20 test. All good.
  • watzupken - Wednesday, June 10, 2020 - link

    On the contrary, my Asrock X470 Tachi with Ryzen 9 3900X running AIDA64 shows a power deviation value of 72%. I have been wondering why is my CPU running hot despite using a 240mm AIO. I subsequently switching to a Scythe Fuma 2 and the temps are still the same around 87 to 90 degs at load.
  • AnarchoPrimitiv - Wednesday, June 10, 2020 - link

    You don't have adequate cooling then, regardless of other issues
  • CiccioB - Wednesday, June 10, 2020 - link

    It doesn't have adequate cooling because the processor is overclocked without it to know that, not regardless of other issues. The issue is there, it is only that you want to deny it at all costs.
  • liquid_c - Thursday, June 11, 2020 - link

    I don’t know who you are, CiccioB but besides the fact that you seem to have issues using the english language, most of your comments (of which there are many, at least on this article) seem to be nonsensical bullshit. Pardon my french.
  • CiccioB - Thursday, June 11, 2020 - link

    Sorry, my native language is not English. Ans the comments cannot be corrected even once you see you made a mistake.

    And I don't really think to write nonsensical bullshit.
    Tell me where my thoughts fail, as I really can see a perfect flow in cause effects that you may miss on your turn.
  • silverblue - Wednesday, June 10, 2020 - link

    The reviewers' BIOS for the X570 Taichi showed a 32% power reporting deviation which would read as about 60W package power in HWINFO when, in actual fact, the load was more like 180W when measured from EPS 12V. BIOS updates have brought this closer to reality (see Gamers Nexus' video on this - https://www.youtube.com/watch?v=10b8CS7wQcM ). Nothing to do with cooling at all.
  • Galcobar - Wednesday, June 10, 2020 - link

    Gamers Nexus tested the X570 Taichi with the original reviewers sample BIOS, then again with the most recent BIOS, and discovered the sample BIOS fudging the power draw while the current BIOS reported accurately.

    It's reasonable to expect the initial BIOS update for X470 boards to allow for Ryzen 3000 processors could follow a similar pattern, if you're not running the most current version.
  • Khenglish - Tuesday, June 9, 2020 - link

    Ian why did you replace the Cu atom in the Linear77 link with an Si atom? In general it's almost impossible to dislodge Si and dopant atoms from their lattice. In fabrication more heat is used to repair a damaged lattice, not cause damage. Electrons are far too light to dislodge Si from a lattice other than from a high power electron beam. Typically when you want to damage an Si lattice you need full atoms for more weight. Metals on the other hand are much easier to get to migrate since you don't really have a lattice to keep atoms stable, and are substantially mobile at much lower temperatures (700C - 1100C used for Si work, with only up to around 400C for metal work).

    In general electromigration has been less of an issue over time as processes further abandon the use of aluminum. Al is much more susceptible to electromigration than Cu. For many years after Cu was introduced, you still had the bottom 1-2 layers of the metal stack as Al because it takes very little Cu migration into Si to poison it, and Cu diffuses easily into Si. Still having aluminum is why processors were still susceptible to electromigration damage, but with it now mostly gone (replaced by more resistive, but less mobile Cobalt) is why you see higher voltages up to 1.5V returning for standard use on Ryzen CPUs, while this is a voltage that would cause too high of a current density and would slowly kill older processors like 32nm SB.
  • State of Affairs - Tuesday, June 9, 2020 - link

    >Ian why did you replace the Cu atom in the Linear77 link with an Si atom?

    So I am not the only one who noticed that. One difference between Cu and Si is that the latter is covalently bonded within its diamond-structured lattice. Those covalent bonds are strong and it takes very high energy electrons to initiate displacement.
  • CiccioB - Wednesday, June 10, 2020 - link

    The real difference is that removing wire atoms, be them Al or Cu, creates an avalanche effects as the more you remove the narrower it becomes and higher the resistance which increments temperature that accelerates the removal of other atoms of the wire.
  • Khenglish - Wednesday, June 10, 2020 - link

    Yes absolutely. Electromigration happens when there is a high enough current density and temperature to begin pushing metal atoms. As some atoms get pushed away forming a void, current density increases more for even more rapid electromigration.

    I've been reading up more on electromigration and it's definitely a complicated area. Here's a summary of key points:

    1. Smaller processes continually result in higher current density, as a reduced design will have half the wire length, but less than half of the cross-sectional area as width and depth both reduce.

    2. FinFETs made the problem worse, as a FinFET occupying the same area as a planar FET will have more width (3d fin) and can push more current, but interconnects are the same size.

    3. There's much more than just the abandonment of aluminum to improve electromigration in recent years. A Cobalt sheath performs much better as a surface to wet Copper to than the traditional TaN sheath, reducing electromigration. Another traditional issue was the top of a copper interconnect would be abutted against an insulator, which it would poorly wet to and cause additional migration at this boundary, even more than a TaN sheath. Doping the top of the copper with Manganese improved the bond, and today fully wrapping the copper wire in Cobalt improved the issue even more. An additional improvement was to zig-zag interconnects at regular intervals so that the 90 degree elbow would serve as a backstop against migration, either preventing it from occurring in the first place, or provide increasing back pressure to prevent more from occurring after it had begun.
  • Spunjji - Monday, June 15, 2020 - link

    This was a really helpful addition to the article. Thank you!
  • WaltC - Tuesday, June 9, 2020 - link

    Great article, Ian. Thank you for taking the time to put these facts in front of people. I run my 3900X at PPT of 330, 240, 120 (on the other values)--When I run the HWinfo power deviation--it shows a max of 350%, current of value 180%--at idle, doing *nothing* apart from running HWInfo. When I run the little bench in CPU-Z on top of running HWInfo, max and current readings drop to 75%, each...;) As you can see, I'm not stock, so the stat has little meaning for me at all. It seems nonsensical, actually.

    Last time I experienced eletromigration was on a 130nm Intel CPU years ago. People would overvolt them overclocking and eventually render them useless--made keychains out of the dead ones, IIRC. You are absolutely correct as to why there is almost no chance of that happening today! CPUs--especially Ryzen--are much too advanced in design, as you mentioned. Intel and AMD both use FINFET, and other things.

    Glad to see the timeliness of your remarks. Tom's H really should have done a lot more legwork before publicizing such half-baked theories. Instead, Tom's Hardware published the half-baked theory and promised to do the legwork later--to see if it was true...;) Sad.
  • CiccioB - Tuesday, June 9, 2020 - link

    You are absolutely correct as to why there is almost no chance of that happening today!

    Of course he's not.
    If you take the same current absorption of a 32nm CPU and a 7nm CPU you would see that the current density that passes through the connection lines is much higher. Current density is a parameter that increases electromigration.
    If you take a 300mm^2 die and a today 80mm^2 die and calculate the energy density in them you will see that the latter dies suffer much more than old ones. High temperature (and hot spot) increases electromigration.
    If you look at voltages used in 32nm PP and in actual ones you will see they are almost the same. But inner tracks are now much more thinner. Voltage is more than a linear factor that increases electromigration.

    So comparing an old CPU to actual ones and say the problem is no more is actually the opposite of what physics is suggesting.
    Intel encountered electromigration problem in their first 22nm chipset, just as an example.
  • Fataliity - Tuesday, June 9, 2020 - link

    Old chips were planar. Todays chips are finfets. Old chips used aluminum todays use cobalt. Also today's 80mm2 chip has much much more wire inside of it, the power delivery is much more advanced and spread out. And the HP cells used for the high frequency parts of the chip are overbuilt.

    Do some research and shut up.
  • CiccioB - Wednesday, June 10, 2020 - link

    Finfet and planar transistor type have nothing to do with electromigration or its mitigation.
    80mm^2 must have more wiring inside because the power absorption is the same as the old 300mm^3 chips and if they haven't so many lines and metal inside they would melt in few seconds.
    But those traces are much narrower than old chips. And dimension for electromigration counts. Really. Do some research.
  • WaltC - Wednesday, June 10, 2020 - link

    FINFET means the transistors are able to shut themselves down individually--the technology exists to keep energy from coming off of the CPU in waves of excess thermal energy caused by current leakage--its purpose is to stop current leakage, etc. It works very well. As Ian mentions, smaller nm processes today require less energy than older designs--coupled with FINfet and *other things* it's whole different ballgame in that regard. And no, a stock motherboard isn't going to create electromigration in normal use. That would be nuts...;)
  • PeterCollier - Thursday, June 11, 2020 - link

    The biggest factor in electromigration is the use of resonant clock meshes. The resonance phenomenon accelerates failure similar to the Tacoma narrows bridge.
  • Fataliity - Tuesday, June 9, 2020 - link

    And that's not even mentioning bamboo structures, the Blech effect, EDA tools designed around electromigration, doping, etc. There are so many things done to prevent this stuff. Chips are literally designed and verified to avoid this. Sure, it can slip through the cracks in rare cases. But most chips are engineered to not suffer this fate.

    The science is wayyyyyyyyyy more advanced than 22nm was.
  • jim bone - Tuesday, June 9, 2020 - link

    The chips are designed for what they're designed for; often a 10 year Gaussian mean life. If you increase the current density you will reduce the statistical life of the chip below what it was designed for, and not necessarily in a predictable way; you may fall off a aging cliff. Engineers will *always* waive some EM violations - no tapeout is clean in this regard.

    I guess I agree on average the headline claim is true most of the time; until it isn't.

    FWIW I'm an IC designer working at a major semiconductor company working in the most advanced nodes available right now. In the past I've worked for AMD designing parts of their chips. I look at EM and aging results several times a year for as part of standard tapeout sign-off.
  • CiccioB - Wednesday, June 10, 2020 - link

    The science is what is put into a PP.
    Tell me what makes 7nm PP less prone to electromigration than 22nm.

    Moreover I have just a simple idea: if the chip could sustain that increased stress for the desired life time it was assigned, why the producer, in this case AMD, has not chosen to sell it with those increased specs?
    More performance at zero costs... but that was not the case.
    My great scientist, tell me why AMD chose to loose money.
  • Lord of the Bored - Wednesday, June 10, 2020 - link

    CiccioB, I am glad you asked. I have a reasonable answer: marketing.
    AMD has had a reputation for making hot, inefficient chips of late. They'd very much like to be rid of that reputation, so they're setting a lower wattage than is strictly necessary.

    And then they're treating that as a hard limit, so they can score points against Intel with their "this is our power limit, except we'll probably draw twice this and not tell you" policy.
  • CiccioB - Thursday, June 11, 2020 - link

    And so it is for this percent point of efficiency they created a "marketing issue" that had resonance in all sites that treat technology by releasing a BIOS that could not make the CPU run at the advertised speeds to try to limit the voltage to the minimum possible?

    My feeling is that this sudden bothering on voltages value trying to keep them as low as possible, even lower than needed and so preventing correct boosting, is quite suspect.
    They historically have never been short on voltage in any of their chips, be them CPUs or GPUs.
  • Lord of the Bored - Thursday, June 11, 2020 - link

    I'm not sure offhand what you're referring to.

    This isn't about a BIOS AMD released, failure to reach advertized speeds, or anything involving voltages.
    It is about motherboard manufacturers programming their BIOSes to lie about how much amperage is being used.

    Amps are not volts. They are, in fact, almost completely unrelated to volts.
  • CiccioB - Friday, June 12, 2020 - link

    I'm referring to this: https://community.amd.com/thread/246897
    or this: https://www.pcworld.com/article/3437401/amd-issues...
    or this: https://www.tomshardware.com/news/amd-ryzen-3000-b...

    Do you know AMD released BIOS version that cannot allow Ryzen 3000 to reach the advertised boost frequency because they set the max voltage too low?
    Now you can do 1+1.
    Maybe
  • WaltC - Wednesday, June 10, 2020 - link

    Uh, he and I are both talking about normal use of the CPUs in normal motherboards. Sorry--but that's not going to create any electromigration at all.
  • Khenglish - Wednesday, June 10, 2020 - link

    Northwoods running 1.7V+ used to die very suddenly, with no prior performance degredation. This was probably more an issue of a metal contact suddenly popping off in one piece than electromigration. .13um was Intel's first copper process, so they probably had issues with getting copper to stick properly.
  • shaolin95 - Tuesday, June 9, 2020 - link

    Here to see all the excuses that AMD fanboys are going to come up with. If it was Intel they will be making riots
  • Deicidium369 - Wednesday, June 10, 2020 - link

    Kiddies will do what kiddies do.
  • AnarchoPrimitiv - Wednesday, June 10, 2020 - link

    What are you even talking about? How does that have anything to do with this article? At whom are you making these accusations?
  • CiccioB - Wednesday, June 10, 2020 - link

    To those spamming the thread about the fact that the issue doesn't even exist in origin.
  • Lord of the Bored - Wednesday, June 10, 2020 - link

    It IS Intel, by design.
    Also, this isn't anything AMD is doing, it is something AMD is actively trying to stop. AMD fanboys should be shouting about motherboard manufacturers betraying AMD's design and specs.
  • 29a - Tuesday, June 9, 2020 - link

    Asrock B450M Pro4 reports 103%.
  • eastcoast_pete - Tuesday, June 9, 2020 - link

    One aspect I would like some more information and discussion on is the effect of temperature on EM. In other words, how do the more current cooling setups such as liquid AIO or even high quality air coolers affect not just the temperature, but, in turn, the chance of EM taking place? My own thinking has always been that there's no such thing as too efficient heat removal for semiconductors, as they like it cold.
  • mode_13h - Wednesday, June 10, 2020 - link

    > dimensionless value (0 to 255) designed to represent 0 = 0 amps, and 255 = peak amps that the VRMs can handle

    I would call that the "normalized VRM load" or perhaps "normalized VRM current".
  • Aephe - Wednesday, June 10, 2020 - link

    Asus Strix x570i with 3950x - showing 98% - 99%
  • Hideo - Wednesday, June 10, 2020 - link

    Hi, my ASROCK Steel Legend B450M with 3300X, no OC, with CineBench R20 MT shows 86%, so some tempering is evident.
    - BTW, is it possible that this is just a work around for MB manufacturers to improve performance by false reporting this info as AMD didn't leave them option to do it in another, more "by the book" way?
    - Secondly, is it possible without MB manufacturers cooperation to check what is difference in some meaningful and quantifiable parameter, e.g power draw or temperature change?
    - Even in best case scenario, I would prefer to have a BIOS switch to turn it off I don't want it. For example, I want to generate less heat as my PC place is small and gets hot during the summer, I would definitely like to have option to turn this thing off, even if it doesn't influence longevity of MB or CPU.
  • Galcobar - Wednesday, June 10, 2020 - link

    To get an accurate power draw figure, you'd have to physically measure it. Say, a clamp on the EPS12V cable; you'd still have to deal with loses in the motherboard's VRM but that should be a few percentage points. Software is unreliable because it reads the same sensor info as the CPU, which is subject to (deliberate) miscalibration by the motherboard BIOS.

    The motherboard manufacturers who did this did it to gain a performance edge, primarily in initial reviews, hoping to capture sales. The problem, like Intel and MCE, is more that it wasn't disclosed to buyers or reviewers. Unlike Intel, AMD doesn't allow motherboards to fiddle with power draw and still claim the CPU is operating within specifications covered by its warranty.
  • quadibloc - Wednesday, June 10, 2020 - link

    I'll admit in these days of rapidly improving electronics, it might seem silly, but I would feel much more comfortable if microprocessors were designed to continue running not for 10 years, but for 1,000 years. After all, the eventual end of Moore's Law is becoming apparent, so people may stop wanting to upgrade. I've taken action, I've put my 3900X on Eco Mode.
  • edzieba - Wednesday, June 10, 2020 - link

    For an article that boils down to "electromigration mitigation is great now" there is very little on what these mitigations actually are. Voltage has remained pretty much flat at a hair over 1v for the last decade after hitting the gate oxide limit, but layer dimensions (and layer thicknesses, especially metals!) has been shrinking since then even if transistor gate size has not. By all accounts, that should mean electromigration is a much greater problem than back on 32nm.
  • CiccioB - Wednesday, June 10, 2020 - link

    This.

    This post explain clearly why electromigration is a problem today more than yesterday.

    Many here are AMD fanboys come just to defend their brand. They cannot argue on a technical basis. Read this and answer to this post if you have some elements to disproof the general warning about this kind of tricks (that AMD disapproves for a reason which is apparently against its interest to see their CPUs going the fastest possible).
  • Korguz - Wednesday, June 10, 2020 - link

    kind of like what you do with intel CiccioB ?
  • CiccioB - Thursday, June 11, 2020 - link

    Huh?
  • Dug - Wednesday, June 10, 2020 - link

    Can't seem to find 'Power Reporting Deviation' in any of the most recent HWiNFO64 builds.

    Wonder why they took it out? Could it have been reporting wrong information?
  • silverblue - Wednesday, June 10, 2020 - link

    There's a beta link on their site for v6.27-4190. I have v6.27-4185 which introduced the power reporting deviation, however since then they have identified that it didn't play ball with Zen and Zen+, and as such have released a new build.
  • silverblue - Wednesday, June 10, 2020 - link

    (which fixes PRD for Zen and Zen+; sorry, I should've stated that at the time)
  • K_Space - Wednesday, June 10, 2020 - link

    @Dug actually it's simply because you clicked on CPU. Instead go to Sensors and scroll down. You'll find Power Deviation there.
  • silverblue - Wednesday, June 10, 2020 - link

    Ryzen 5 3600 at stock using original Wraith Spire cooler, Gigabyte GA-AB350-Gaming 3 with F50a BIOS, 2x8GB Patriot Viper Elite DDR4 3000MHz (CL16), using 1usmus Ryzen Universal power plan - Cinebench R20 MT score of 3403, but more importantly, Power Reporting Deviation of 132% to 135% throughout the run. I can only get it to 118% using CPU-Z (which is outside the scope of this test), so perhaps it's a result of very conservative calibration - after all, on this board, the VRMs don't have a heatsink.
  • abufrejoval - Wednesday, June 10, 2020 - link

    I don't mind at all being able to overclock a CPU, or other parts of a PC.

    But I do prefer to be given a choice.

    Most of the time the noise and heat a machine generates will matter to me: Ideally I'd want the ability to tell a machine: "Don't use more than x Watts, make the most of it."

    Perhaps I'd also want to say something like "Use up to 6 cores, but go full-in on Watts)", because I know that a game or piece of software won't scale further anyway.

    I'd love to be able to do this at run-time and I'd really want to be sure, that these limits are not overstepped. And of course, this should work the same on Windows, Linux, BSD or Qubes.

    These power limits shouldn't be limited to just the CPU either. Demanding to accomodate USB devices etc. might go a little far, but GPU and memory: That should be included in the calculations or measurements.

    I keep rebuilding machines and I will reuse components that are still viable. I have Gold+ rated power supplies that I try to operate at around 80% rated performance for peak usage, but that requires for the computer components to stick to their ratings or settings.

    Of course, I measure to make sure, because few things are as nasty as faults induced from borderline power, but I prefer to set limits instead.

    I recently tried to put a 65 max Watt appliance together, using existing mini-ITX cases with Pico-ATX PS and external 12V power bricks, but equipped with ECC RAM and ideally 64GB of it. I wanted it to use 8 or more cores, go easy on clocks when loaded, but sprint to 4GHz on single threads, as long as it would never overwhelm the power supply.

    It turned out almost impossible, because Intel doesn't stick anywhere near to 35 Watts when you want them to ... unless you buy a notebook instead.

    I want run-time cTDP for all the major components (CPU, RAM, GPU for starters), within the limits that they already technically support, but not expose to user control: Is that so much to ask?
  • Oxford Guy - Wednesday, June 10, 2020 - link

    What are these special measures that manufacturers put into place to reduce electromigration?

    Where is the data? Let's see some charts.
  • eastcoast_pete - Wednesday, June 10, 2020 - link

    This whole EM topic has once again become more the subject of what is apparently a religious war, and drawn attention away from a key point regarding Ryzen CPUs, also mentioned by Ian and explained in a good write-up by the Stilt (linked to in Ian's piece). In a nutshell, AMD CPUs rely on the MB to tell them how much power they are using, and then adjust accordingly. Unfortunately, at least two major MB makers have used that to boost performance by fudging the values sent to the CPU. Now, AMD CPUs are vulnerable to that because they outsource that function, but AMD doesn't condone the fudging. The solution to that is straightforward: Anandtech and other reviewers, please call the cheaters out, and AMD, please do likewise when it comes to certifying vendors. Lastly, if EM is a major issue of Ryzen chips, why hasn't there been a class action lawsuit here in the US? How many 1st and 2nd generation Ryzens have dropped dead? Lastly, is there software that can reliably discriminate between EM and other causes of performance drops and instability? I'd really like the answer to the last one!
  • Haltursson - Thursday, June 11, 2020 - link

    I had a K6-2 350Mhz that i ran at 430 ish at v2,9 at the time with no issues during use, replaced it after two years with a K6-III 450 i think and in removing it, the whole chip turned to dust as i removed it from the socket....
  • Big Nish - Thursday, June 11, 2020 - link

    CPU: Ryzen 5 3600
    Motherboard: ASUS X570 Crosshair VIII Impact
    BIOS: 1302
    Cinebench Version: R20.060
    Cinebench Score (Multihread): 3669

    HWiNFO Readings during run:
    All core boost: 4050Mhz
    CPU Die temp: 81C (Cooler - Fractal Design Celsius+ S28 Dynamic set to Auto)
    CPU Package Power: 90.164W
    CPU PPT: 88W
    Power Reporting Deviation: 91%

    This seems to indicate that ASUS has been a little loose with its lookup table.
  • Oxford Guy - Thursday, June 11, 2020 - link

    The Stilt said he wouldn't recommend putting more than 1.475 volts into Piledriver so his advice was more conservative than a lot of advice flying around overclocking forums.

    But this article implies that chips aren't as vulnerable as 32nm SOI Piledriver was. I'd love to see the data on all of this.
  • wow&wow - Thursday, June 11, 2020 - link

    Is that kind of OEM activity funded by "Intel Inside"?
  • papapapapapapapababy - Friday, June 12, 2020 - link

    https://www.hwinfo.com/forum/threads/explaining-th...

Log in

Don't have an account? Sign up now