Some time ago, on the Apple campus building, a group of engineers got together. Isolated from others in the company, they took the guts of old MacBook Air laptops and tied them to their prototype boards with the goal of building the first machines that would run macOS on Apple’s custom-designed, ARM-based silicon.
To hear Craig Federighi of Apple tell the story, it feels like responding to a call by Steve Wozniak in a Silicon Valley garage many years ago. And this week, Apple finally took the big step these engineers were preparing for: it launched the company First Mac to run on Apple Silicon, The Mac product line begins to move away from Intel CPUs, which have been in line with industry standards for desktop and laptop computers for decades.
In a conversation shortly after the M1 announcement with Apple’s senior vice president of software engineering Craig Federighi, senior vice president of global marketing Greg Jusoyac, and senior vice president of hardware technologies Johnny Srouji, we learned – unsurprisingly – that Apple has been planning this change for many years.
Ars spoke at length with these executives about the architecture of the first Apple Silicon chip for Macs (the Apple M1). While we had to get some inquiries about the evolving cases of software support, there was one big question on our minds: What were the reasons behind the drastic change of Apple?
why? Why now?
We started with this big idea: “Why? And why now?” We got a response from Apple too from Federighi:
The Mac is the soul of Apple. I mean, the Mac is what has pushed so many of us into computing. And the Mac is what brought so many of us to Apple. And the Mac remains the tool we all use to do our jobs, to do everything we do here at Apple. Hence, having the opportunity … to apply everything we’ve learned to the systems that are the core of the way we live our lives is of course a long-term ambition and a kind of dream come true.
“We want to create the best products that we can,” added Al-Sorougy. “We really needed our custom silicone to really deliver the best Macs we could offer.”
Apple began using Intel x86 CPUs in 2006 after it became apparent that PowerPC (the predecessor architecture of Mac processors) had reached the end of the road. In the first several years, these Intel chipsets were a huge boon for the Mac operating system: they made it possible to interoperate with Windows and other platforms, making the Mac a more flexible computer. They have allowed Apple to focus more on the increasingly popular laptops as well as desktop computers. They made the Mac more popular in general, paralleling the massive success of the iPod and, soon after, the iPhone.
And for a long time, Intel’s performance was first-rate. But in recent years, Intel’s CPU roadmap has been less reliable, both in terms of performance gains and consistency. Notice Mac users. But the three men we spoke with insisted that this was not the driving force behind the change.
“This is about what we can do, right?” Josoyac said. “It’s not about what anyone else can or can’t do.”
He continued, “Every company has an agenda.” “The software company hopes the hardware companies do. The hardware companies wish OS do, but they have competing agendas. That is not the case here. We have one agenda.”
When the decision was finally made, the circle of people who knew about him was very small at first. “But these people who knew have been walking around smiling since the moment we said we were on this path,” Federighi recalls.
Srouji described Apple as being in a special position to make the move successfully: “You know, we don’t design chips as dealers, vendors, or generic solutions – giving the ability to closely integrate with the software and the system. The product – exactly what we need.”
What Apple needed was a chip that took lessons learned from years of optimizing mobile systems on a chip for iPhone, iPad and other products, then added to all sorts of add-ons in order to meet the growing needs of a laptop or desktop computer.
“During pre-silicon, even when we designed architecture or defined features, Craig and I sat in the same room and said, ‘Well, that’s what we want to design,'” Srouji recalls. These are the things that are something. ” ‘
When Apple first announced plans to launch the first Apple Silicon Mac this year, spectators speculated that the iPad Pro chipset from the A12X or A12Z was planned and that the new Mac chip would be something like the A14X – an enhanced variant of the chips that shipped in the iPhone 12 this year.
Not exactly Federighi said:
The M1 is basically an all-in-one kit, if you want to think about it for the A14. Because when we set out to build the chip for the Mac, there were many differences from what we would have gotten in the corresponding A14X, for example.
We’ve done a lot of analyzes of Mac application workloads, the types of graphics / GPU capabilities required to run a typical Mac workload, the types of texture formats required, support for different types of GPU computing and the things that were available on the Mac … just up to the number of cores, The ability to run Mac-size screens, support for virtualization and Thunderbolt.
There are many, many more capabilities that we designed in the M1 that were requirements for the Mac OS, but these are all all-in-one compared to what the app compiled for the iPhone expects.
Saddles expand at point:
It started the basis for many of the IP addresses that we built that became the foundations for building the M1 on top of it … more than a decade ago. You know, we started with our CPU, then graphics, ISP, and neural engine.
So we built these amazing technologies over a decade, and then a few years ago, we said, “Now is the time to use what we call scalable engineering.” Because we have the basis of these awesome IP addresses, the architecture is scalable with UMA.
Then we said, “Now is the time to build a chip for Mac,” which is the M1. It’s not like some of the iPhone chips found on steroids. It’s a completely different dedicated chip, but we use the basis of many of these great IP addresses.
Unified memory architecture
UMA stands for “Unified Memory Architecture”. When potential users look at the M1 benchmarks and wonder how a relatively low-power, mobile-derived chip could be capable of this kind of performance, Apple points to UMA as a key component of this success.
Federighi claimed that “graphics display pipelines or modern computing” have evolved, becoming a “hybrid” of GPU computing, GPU rendering, image signal processing, and more.
UMA basically means that all of the components – the central processor (CPU), graphics processor (GPU), neural processor (NPU), image signal processor (ISP), etc. – share one set of very fast memory placed very close to them all. This conflicts with the common desktop model, for example, allocating one set of memory to the CPU and another to the GPU on the other side of the board.
When users run demanding and multifaceted applications, traditional pipelines may end up losing a lot of time and efficiency in moving or copying data so that it can be accessed by all of those different processors. Federighi suggested that Apple’s success with the M1 is partly due to the rejection of this ineffective hardware and software model:
Not only did we get a great advantage of just the initial performance of our GPU, but just as important was the fact that with the unified memory architecture, we weren’t constantly moving data back and forth and changing formats that slowed it down. And we get a huge increase in performance.
And so I think the workloads of the past were like, create the triangles you want to draw, ship them to the discrete GPU and let it do what you want and never look back – that’s not what a modern computer display pipeline looks like today. These things move back and forth between many different execution units to achieve these effects.
This is not the only improvement. For a few years now, Apple’s metallic graphics API has used “tile-based delay rendering,” whose GPU M1 is designed to take full advantage of. Federighi explained:
As legacy GPUs essentially run the entire frame at once, we’re working with boxes that we can transfer to very fast memory on the chip, and then perform a massive series of operations using all the different execution units on that square. It’s incredibly bandwidth efficient in a way that these discrete GPUs just wouldn’t. And then it just combines that with the massive width of our pipeline to RAM and other chip efficiencies, and it’s a better architecture.