I was fairly surprised at Apple’s announcement of their transition from PowerPC processors to using Intel chips, and I am still fairly sceptic about this whole affair. After having watched the keynote, I am seeing things in a slightly more positive light (mostly due to Steve Jobs being an excellent salesman).
Having written assembly for x86 and having read a lot of documentation about PPCs, it very much feels like moving to an inferior architecture with a superior implementation. Suddenly people have to pass arguments over the stack again, they (according to the Universal Binary guide) need to check for MMX, SSE, SSE2, SSE3 and optimise accordingly instead of a nice orthogonal AltiVec unit or no vectorisation. Welcome back to a stack-based FPU with 8 registers, same as the integer core and no more Open Firmware. I at least hope that Apple is going to ensure a certain standard in the Intel CPUs they sell (e.g. x86-64 + SSE3 guaranteed), to avoid even more ugly #ifdefs and code-paths than are already necessary for ensuring a single code-base builds on big- and little endian machines with different ABIs and capabilities.
As far as I am aware, the PowerPC stores the return address in a register, and is thus harder to exploit via buffer overflows. x86s store their return addresses on the stack, which makes them more vulnerable to these types of attacks. Recently, Microsoft has made that a bit harder by storing sentry cookies on the stack and checking them in SP2 for Windows XP and SP1 for Windows 2003 Server, but that is something of a work-around that costs you performance as well as stack-space.
Apple seem to offer the tools to make this transition less grating, but it is work with no immediately obious pay-off in sight. Certainly they are going into this with much more information than any of us have, so we’ll have to wait and see how things play out. I am well aware, that the CPU does not make a Mac; and I will hardly leave Mac OS X behind for any of the alternatives because Intel now gets a share of my money instead of Freescale / IBM.
code
- bitbucket repositories
- Wavelet Library 3 Newer version of the lossy and lossless, completely embedded image compression library (under zlib-License)
- WowPlot Graphical analysis tool for World of Warcraft combat logs for Mac OS X Leopard (10.5). Its main focus lies in evaluating time-dependant combat performance in a very free-form fashion.
Categories
Archives
- July 2011
- April 2011
- March 2011
- October 2010
- August 2010
- July 2010
- June 2010
- November 2009
- July 2009
- May 2009
- January 2009
- October 2008
- September 2008
- July 2008
- May 2008
- March 2008
- January 2008
- October 2007
- September 2007
- July 2007
- April 2007
- March 2007
- January 2007
- December 2006
- October 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- August 2005
- July 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
- December 2003
- November 2003
- October 2003
- September 2003
- August 2003
- July 2003
- June 2003
- May 2003
- April 2003
- March 2003
- February 2003
- January 2003
- December 2002
- November 2002
- October 2002
- September 2002
- August 2002
- July 2002
- June 2002
- May 2002
- April 2002
- March 2002
- February 2002
- January 2002
- December 2001
- November 2001
- October 2001
- September 2001
- August 2001
- July 2001
- June 2001
- May 2001
- April 2001
- March 2001
- February 2001
I agree with you for the most part (especially the bits about MMX, SSE, etc and the utterly miserable FPU) I’ve never worked with PPC, so I’m not familiar with how it handles stack operations/function call returns. But I do have a few points to make:
1. PowerPC does indeed use a register (LR, the Link Register) for storing function return addresses. But the called function will most likely immediately store the LR to the stack, so that it is able to make function calls of it’s own. There’s only one LR; without using the stack, nested function calls wouldn’t work. Say you have three functions, for the sake of originality, we’ll call them A, B and C. Function A calls Function B; the return address to return to A is stored in the Link register. Function B then calls function C. Barring some major advances in the application of quantum mechanics, the return address could not be stored in the same register without overwriting the existing address. Function B must save A’s return address before calling C. It could save it at an absolute location in a data segment, but that makes the code non-reentrant (Function B might be interrupted and called again from another thread, after all) and makes recursion impossible. Re-entrant and/or recursive code (most code, these days) would need to store LR to the stack before calling other routines and restore it from the stack before finally returning to it’s caller.
2. Stack protection on x86 processors has two major components: the NX bit and guard pages. Memory allocated for data storage (including the stack) has the NX bit set by default on CPUs that support it. With the NX bit set, even if someone manages to inject code into the stack and ovewrite the return address to call it, the CPU will refuse to execute it. Guard pages are implemented by ensuring that one page of virtual address space on either side of the space allocated for the stack are not mapped to physical storage. Thus, any attempt to overrun an allocated buffer will cause a protection fault. (Unlike the NX bit, which is only supported by relatively recent CPUs, this mechanism works on all VM-capable CPUs from the 386 and up.) Neither of these mechanisms require additional instructions or checks in the code (aside from the initial allocation of memory in the OS kernel) because they are implemented in the CPU/MMU hardware, and have little or no performance impact.
3. The remaining change made to stack handling in XP SP2 (inserting guard flags between function calls within the stack and verifying them before returning from the function) does require extra code and does impose a performance penalty. However, these modifications aren’t a result of changes to the OS code, but rather to the compiler used to compile the OS. VC has a new option that enables insertion of these checks during compilation. Microsoft enabled this protection in XP SP2 by simply turning on the compiler option and recompiling everything. Third party code isn’t affected unless it too is recompiled with the new compiler option enabled. Recent versions of GCC and other compilers also have this option, and it would not surprise me in the least if newer versions of OS X contain code using such protection. (In fact, I’d be rather disturbed if it did not.)