Original Link: https://www.anandtech.com/show/21484/intel-raptor-lake-instability-faults-voltage-microcode-fix-in-august

In what started last year as a handful of reports about instability with Intel's Raptor Lake desktop chips has, over the last several months, grown into a much larger saga. Facing their biggest client chip instability impediment in decades, Intel has been under increasing pressure to figure out the root cause of the issue and fix it, as claims of damaged chips have stacked up and rumors have swirled amidst the silence from Intel. But, at long last, it looks like Intel's latest saga is about to reach its end, as today the company has announced that they've found the cause of the issue, and will be rolling out a microcode fix next month to resolve it.

Officially, Intel has been working to identify the cause of desktop Raptor Lake’s instability issues since at least February of this year, if not sooner. In the interim they have discovered a couple of correlating factors – telling motherboard vendors to stop using ridiculous power settings for their out-of-the-box configurations, and finding a voltage-related bug in Enhanced Thermal Velocity Boost (eTVB) – but neither factor was the smoking gun that set all of this into motion. All of which had left Intel to continue searching for the root cause in private, and lots of awkward silence to fill the gaps in the public.

But it looks like Intel’s search has finally come to an end – even if Intel isn’t putting the smoking gun on public display quite yet. According to a fresh update posted to the company’s community website, Intel has determined the root cause at last, and has a fix in the works.

Per the company’s announcement, Intel has tracked down the cause of the instability issue to “elevated operating voltages”, that at its heart, stems from a flawed algorithm in Intel’s microcode that requested the wrong voltage. Consequently, Intel will be able to resolve the issue through a new microcode update, which pending validation, is expected to be released in the middle of August.

Based on extensive analysis of Intel Core 13th/14th Gen desktop processors returned to us due to instability issues, we have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor.

Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation.

Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance.
-Intel Community Post

And while there’s nothing good for Intel about Raptor Lake’s instability issues or the need to fix them, that the problem can be ascribed to (or at least fixed by) microcode is about the best possible outcome the company could hope for. Across the full spectrum of potential causes, microcode is the easiest to fix at scale – microcode updates are already distributed through OS updates, and all chips of a given stepping (millions in all) run the same microcode. Even a motherboard BIOS-related issue would be much harder to fix given the vast number of different boards out there, never mind a true hardware flaw that would require Intel to replace even more chips than they already have.

Still, we’d also be remiss if we didn’t note that microcode is regularly used to paper over issues further down in the processor, as we’ve most famously seen with the Meltdown/Spectre fixes several years ago. So while Intel is publicly attributing the issue to microcode bugs, there are several more layers to the onion that is modern CPUs that could be playing a part. In that respect, a microcode fix grants the least amount of insight into the bug and the performance implications about its fix, since microcode can be used to mitigate so many different issues.

But for now, Intel’s focus is on communicating that they have fix and establishing a timeline for distributing it. The matter has certainly caused them a lot of consternation over the last year, and it will continue to do so for at least another month.

In the meantime, we’ve reached out to our Intel contacts to see if the company will be publishing additional details about the voltage bug and its fix. “Elevated operating voltages” is not a very satisfying answer on its own, and given the unprecedented nature of the issue, we’re hoping that Intel will be able to share additional details as to what’s going on, and how Intel will be preventing it in the future.

Intel Also Confirms a Via Oxidation Manufacturing Issue Affected Early Raptor Lake Chips

Tangential to this news, Intel has also made a couple of other statements regarding chip instability to the press and public over the last 48 hours that also warrant some attention.

First and foremost, leading up to Intel’s official root cause analysis of the desktop Raptor Lake instability issues, one possibility that couldn’t be written off at the time was that the root cause of the issue was a hardware flaw of some kind. And while the answer to that turned out to be “no,” there is a rather important “but” in there, as well.

As it turns out, Intel did have an early manufacturing flaw in the enhanced version of the Intel 7 process node used to build Raptor Lake. According to a post made by Intel to Reddit this afternoon, a “via Oxidation manufacturing issue” was addressed in 2023. However, despite the suspicious timing, according to Intel this is separate from the microcode issue driving instability issues with Raptor Lake desktop processors up to today.

Short answer: We can confirm there was a via Oxidation manufacturing issue (addressed back in 2023) but it is not related to the instability issue.

Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.

For the Instability issue, we are delivering a microcode patch which addresses exposure to elevated voltages which is a key element of the Instability issue. We are currently validating the microcode patch to ensure the instability issues for 13th/14th Gen are addressed.
-Intel Reddit Post

Ultimately, Intel says that they caught the issue early-on, and that only a small number of Raptor Lake were affected by the via oxidation manufacturing flaw. Which is hardly going to come as a comfort to Raptor Lake owners who are already worried about the instability issue, but if nothing else, it’s helpful that the issue is being publicly documented. Typically, these sorts of early teething issues go unmentioned, as even in the best of scenarios, some chips inevitably fail prematurely.

Unfortunately, Intel’s revelation here doesn’t offer any further details on what the issue is, or how it manifests itself beyond further instability. Though at the end of the day, as with the microcode voltage issue, the fix for any affected chips will be to RMA them with Intel to get a replacement.

Laptops Not Affected by Raptor Lake Microcode Issue

Finally, ahead of the previous two statements, Intel also released a statement to Digital Trends and a few other tech websites over the weekend, in response to accusations that Intel’s 13th generation Core mobile CPUs were also impacted by what we now know to be the microcode flaw. In the statement, Intel refuted those claims, stating that laptop chips were not suffering from the same instability issue.

Intel is aware of a small number of instability reports on Intel Core 13th/14th Gen mobile processors. Based on our in-depth analysis of the reported Intel Core 13th/14th Gen desktop processor instability issues, Intel has determined that mobile products are not exposed to the same issue. The symptoms being reported on 13th/14th Gen mobile systems – including system hangs and crashes – are common symptoms stemming from a broad range of potential software and hardware issues. As always, if users are experiencing issues with their Intel-powered laptops we encourage them to reach out to the system manufacturer for further assistance.
-Intel Rep to Digital Trends

Instead, Intel attributed any laptop instability issues to typical hardware and software issues – essentially claiming that they weren’t experiencing elevated instability issues. Whether this statement accounts for the via oxidation manufacturing issue is unclear (in large part because not all 13th Gen Core Mobile parts are Raptor Lake), but this is consistent with Intel’s statements from earlier this year, which have always explicitly cited the instability issues as desktop issues.

Log in

Don't have an account? Sign up now