Original Link: https://www.anandtech.com/show/8542/cortexm7-launches-embedded-iot-and-wearables



Introduction

Last week, I had the distinct pleasure of visiting ARM’s Austin Texas campus for a meeting with Vice President of CPU Product Marketing Nandan Nayampally. The topic of discussion: ARM’s next Cortex-M processor, codename Pelican, is officially launching today as the Cortex-M7. Thankfully, unlike the A series CPU cores mobile device enthusiasts have grown to love, M and R series cores are typically announced at the same time as retail availability from ARM’s customers. ARM has therefore been working with lead semiconductor partners for some time, and their fully integrated products should have similar launch announcements soon.

If you are not familiar with the M and R series processors from ARM, I don’t blame you. These microcontroller processors don’t receive quite the coverage as A series application processors like the A7, A9, A15, A53 and A57 do. Considering my own heritage learning about PC technology, this is understandable. I always wanted to learn about the latest processors from AMD and Intel as these most directly related to my productivity and entertainment. However, if the smartphone wave was any indication, the next decade of productivity and entertainment might come from processors we don’t expect or even know about.

Fundamentally, the M series processors are considered microcontrollers and not application processors, mainly because they lack a memory management unit (MMU). An MMU’s primary role is to sit between the processor and memory, intercepting all memory references and performing translation between virtual addresses and physical addresses. ­General purpose operating systems such as Linux (Android), Windows, OSX, and iOS require an MMU to function. That means M series processors, like all microcontrollers (MCUs), will never be tasked with running general purpose operating systems.

MCU proliferation is actually already happening today. Nandan mentioned that a device containing a single R or A series processor might also contain dozens of Cortex-M processors performing a variety of tasks that make your life better. Nandan couldn’t mention any specific implementations, as he deferred to ARM's customer’s for that, but everything from smart thermostats, simple smart watches like the Pebble, and IC power management units are implemented with low power microcontrollers.

Even features we love on our high powered smartphones such as Microsoft’s Lumia Glance Screen and SensorCore, Motorola’s Moto Voice and Moto Display, and the iPhone’s Motion Coprocessor cannot be directly attributed to bigger faster application processors but rather come from MCUs, and arguably they have just as big an impact on our lives as a lower Sunspider score.

Looking at ARM’s track record in the past five years, one can easily correlate their success with the smartphone craze started with the original iPhone containing an ARM11 processor. From there, the 3GS upgraded to a Cortex-A8, 4S used the Cortex-A9, and Google attacked the same market with Android supporting the ARM11 and subsequent ARM cores as well. While ARM’s meteoric rise might have started with the ARM11, it was years of prior work that provided a solid foundation for partners and customers. In fact there were 11 cores released prior to the ARM11 (fancy that), including six revisions to their instruction set.

ARM has been investing in the Cortex-M line since 2004, when they first released the Cortex-M3. For a decade, ARM has been combating 8-bit and 16-bit microcontrollers with their 32-bit M series. With the release of the M7, ARM feels they have a very complete microcontroller lineup and have effectively “crossed the chasm” into the mainstream. The competition continues today, but with eight billion (yes, with a b) processors shipped to date containing at least one Cortex-M core, the M series is higher volume than all other ARM cores combined. In the first half of this year alone, ARM’s partners have shipped 1.7 billion Cortex-M units, where a unit is defined as a chip containing at least one Cortex-M processor.

Responding to market demand for powerful voice, sensor, display, and control offloading, ARM brings a higher performance, feature rich core to the Cortex-M family with the M7.



The Cortex-M7 CPU

The primary focus of the Cortex-M7 is improved performance. ARM’s goal was to elevate the M series performance to a level previously unseen, while maintaining the M series' signature small die size and tiny power consumption. There are at least two reasons ARM focused on performance for the M7 processor. First, they want to further drive a wedge between traditional 8- and 16-bit microcontrollers and provide ARM a further differentiated market position; second, the M7 will help support the IoT (Internet of Things) and wearable device markets. Focusing on enhanced DSP capabilities, the M7 is more suited to audio and visual sensor hub processing than any previous M series design.

Digging into the details, the Cortex-M7 features a six-stage, in-order, dual-issue superscalar pipeline with single- and double-precision floating point units, instruction and data caches, branch prediction, SIMD support, and tightly coupled memory. Here's the high level view of the pipeline:

The presence of instruction and data caches, branch prediction, as well as tightly coupled memory are differentiating features of the M7 versus previous M series processors. Microcontrollers often forego caches and sometimes even operate with flash as the only memory interface. By providing high performance instruction and data caches, the M7 approaches more typical high performance processor design.

Tightly coupled memory (TCM) is a technology ARM’s partners can use to extend the effective caching of a single M7 processor and has only been seen in previous A and R series designs. In use, it can have the performance of a cache but, unlike cache, its contents are directly controlled by the developer. That is, TCM is part of the physical memory map of the microcontroller. Developers can place critical code and data inside TCM that can be deterministically accessed with high performance in routines such as interrupt service requests. The M7 supports up to 16 MB of tightly coupled memory.

Adding branch prediction allows arm to target dedicated DSP devices with its Cortex-M7 microcontroller. DSP code is often analog data stream filters for applications such as audio input keyword detection, audio output equalization, and frequency domain amplitude peak searching. When running on an always-on microcontroller these tasks are almost always looped. Without a branch predictor, the code must continually evaluate a loop condition that 99.9% of the time results in the same outcome. Branch predictors cost extra die space but when DSP is your target, they are an obvious design benefit.

Summarizing the M series cores can be done both from an instruction features standpoint and also a die size and performance standpoint. Unfortunately ARM, who provides HDL (Hardware Description Language) that can be synthesized to physical chips, was not yet willing to provide die size numbers until their partner Cortex-M7 announcements, since the processor does not become physical until a partner gets involved. Until a partner releases data, we can simply assume the M7 somewhat larger than its predecessors.

ARM Cortex-M Instruction Sets
  M0 M0+ M3 M4 M7
Thumb Most Most Entire Entire Entire
Thumb-2 Subset Subset Entire Entire Entire
Hardware multiply 1 or 32 cycles 1 or 32 cycles 1 cycle 1 cycle 1 cycle
Hardware divide No No Yes Yes Yes
Saturated math No No Yes Yes Yes
DSP Extensions No No No Yes Yes, enhanced
Floating-point No No No Optional single precision Yes
Tightly coupled memory No No No No yes
Architecture ARMv6-M ARMv6-M ARMv7-M ARMv7-M ARMv7-M
Cache Architecture Von Neuman Von Neuman Harvard Harvard Harvard

 

ARM Cortex-M Area, Power, Performance
  M0 M0+ M3 M4 M7
90nm LP dynamic power (µW/MHz) 16 9.8 32 33 n/a
90nm LP area mm2 0.04 0.035 0.12 0.17 n/a
40nm G dynamic power (µW/MHz) 4 3 7 8 n/a
40nm G area mm2 0.01 0.009 0.03 0.04 n/a
Dhrystone (official) DMIPS/MHz 0.84 0.94 1.25 1.25 2.14
Dhrystone (max options) DMIPS/MHz 1.21 1.31 1.89 1.95 3.23
CoreMark/MHz 2.33 2.42 3.32 3.40 5.04

ARM did state that power consumption of M7 is roughly in line with previous performance/mW, so we could estimate a corresponding increase of 50% to 75% more power consumption. Area is anyone's guess at the moment.



Hybrid Systems

While the Cortex-M series aims to be the MCU for many application markets including IoT and wearables, ARM does not expect M series processors to always be used alone and expects many devices to combine A series application processors with the M seires. When I mentioned the word coprocessor referring to the M series, Nandan quickly pointed out that in this market, the A series might actually be considered the coprocessor. Considering the MCU is the always on device and the A series CPU wakes only sparingly, I can see his point of view. The following diagram from ARM lays out this perspective well.

MediaTek used a much simpler table to describe the sub markets of IoT and wearables that, as I noted at the time, insinuated there was no overlap between MCUs and APs. I tend to agree more with Intel’s Edison platform and ARM’s slide here that there are large market segments that will indeed be combining these differentiated processors.

When designing hybrid systems like this for IoT and wearables, it is very important to synthesize the AP with power optimization goals. The process of synthesizing HDL to an ASIC is essentially an optimization problem, much like all engineering. Targeting one aspect of performance, such as power consumption, means you’re willing to sacrifice something else. The prevailing trend so far has been to reuse smartphone processors in wearables. Companies practicing this approach are not optimizing their wearables' power use but are instead optimizing time to market and internal expenses.

To emphasize what this means, when the Cortex A15 launched ARM stated it was optimized for 1.2 GHz operation. When the first smartphone featuring an A15 hit the market it actually ran at much higher voltages to achieve higher frequencies and thus relatively high power consumption. Reusing this chip inside an IoT or wearable device is not only choosing a performance focused CPU instead of a power optimized one like the A7, but it has also been synthesized to further push the CPU away from power efficiency. This is why many wearables today featuring rich operating systems have struggled with battery life. Apple has traditionally been conservative with smart phone SoC power consumption and it will be interesting to see how their new wearable is designed.

For wearable devices, ARM recommends reducing A series frequency and area by over half, which has a direct effect on power consumption. ARM states that wise choices of CPU cores and caches, synthesis goals, and software optimizations to offload certain tasks to an MCU, can reduce power consumption by as much as 85%. This will be something we will keep an eye on when we review future wearables.



Final Words

In the past two days we have seen two announcements targeting IoT and wearable devices from major semiconductor companies. These announcements may seem premature as there hasn’t been a game changing device release like the original iPhone was, but the companies involved are simply doing what they have done before and investing in the future. Much of ARMs work before the ARM11 processor was behind closed doors or simply unnoticed by consumers at large, but it was there. Watching the suppliers evolve to provide suitable products, documentation, and tooling for something new is exciting.

The Coretex-M7 processor pushes the performance of ARMs dedicated MCU line to new levels, helping ARM further consolidate MCUs and DSPs into a single ARM ISA compatible 32-bit CPU. The increased performance and features also allow device makers to rely more on the always on MCU and power up the AP much less often, improving overall power consumption and even enabling devices that were previously impractical.

ARM also invested in improving the interrupt latency of the M7 and provides qualification kits for safety critical standards like ISO 26262 (automotive) and IEC61508. Anyone with experience here knows just how expensive creating these kits can be.

For major MCU SoC vendors like ST, Atmel, NXP, Freescale, TI, and others, the prospect of integrating a CPU core off-the-shelf from ARM means less R&D investment in processor design and documentation with a potentially greater payoff by tapping into the ARM ecosystem. Additionally, rather than hand coding their own development IDEs and compilers, they can serve their users with existing open source alternatives. To consumers, consolidation could bring similarities to the Windows + Intel days when software developers gained immense efficiency.

Many IoT and wearable devices today accomplish their roles using high power A series application processors. This is by design, as the companies providing them simply reused their smartphone processors to get product to market as fast as possible. Designing a heterogeneous system using MCUs and application processors is more complex but the benefits are undeniable. As the IoT and wearable devices mature, we will keep a close eye on the underlying technology of future devices and hope to report some positive changes over the next year.

Log in

Don't have an account? Sign up now