Saturday, February 04, 2006

operating system blues

sometime in the past few months, apple singapore's uber-tech-geek-salesman engaged the company in a warp-speed demo of the latest release of the apple operating system x "tiger". "panther" was the previous release, 10.3.9 being the last of that particular species.

at the time the demo was held, latest tiger was 10.4.3 -- but we were doing a controlled test on a few units of our render farm; and a cautionary email from our software vendor on the potentially fatal interactions of 10.4.3 and our main 3D software gave us pause. so we elected to go one release down: 10.4.2 -- and additionally, to use the client version so that we would not be saddled with the likely processor overhead of all the services that are installed with the server version.

in the fullness of time, i was able to run our standard benchmark on these few machines that had been inoculated with the latest strain of the operating system.

the numbers came in... ...and the numbers were bad.

i don't know, but it seems to me that whenever apple upgrades their operating system, there'll always be something broken among the stuff that was supposedly fixed, and some of that can be show-stoppers.

in our case, we run three successive versions of our main 3D software -- this has to do with version-to-version incompatibilities: lower version won't load files from higher version, higher version sometimes breaks features of lower version... ...but all three versions run well under os x "panther", under the control of our preferred render management software.

by way of a little background, the render management software is built around the concept of manager/slave dependencies. add to this mix the various operating systems and their insular user interaction/permission models. suffice to say that there is a user that is common to all three platforms we employ, and that user profile is the one that the render management system uses to pass jobs on to the rendering guts of our main 3D program. as you can probably guess, this kind of setup takes some time to set up and optimize (and indeed, may not even be perfectly optimized at this very moment). and all that effort has paid off, the farm runs without any major glitches (other than the xserve fileservers' mysterious dropping of mount points on random machines). on 10.3.9.

guess what. of the three versions (all of which were already ".1" bugfix releases), the first two would not render if the dedicated profile were not logged into the xserve. now that is a bummer -- it stands to reason that with an active user session, the operating system would therefore be allocating resources to the log-on, taking away compute cycles etc., that would otherwise be used by the render process. and that, in effect, is what we observed. with the active user session, render times for the benchmark were down across the board, worst being in double-digit percentages. with no user logged in, the latest version managed a single digit percentage loss.

web research led to a page on the development mailing list of our render management software. there, the main programmer opined that operating system resources that had been employed by the programmers of our main 3D software had been rewritten in the 10.3 - 10.4 transition, to the effect of running afoul of the unix permissions underpinning os x. so apparently, this is fixed in the last ".1" release, as it ran regardless of logon state on the xserve.

however, we still valued the previous two versions, for reasons stated above.

okay. regroup, reconsider. what if we tried the server version instead, and the latest build to boot? and so we set up an xserve with tiger server 10.4.4.

um. single digit losses, across the board.

talk about lesser of two evils. trade off "relative" ease of administration with effectively reducing the performance of the entire xserve farm by double-digits -- if we gave a hoot about backward compatibility (and we do, very much).

...what to do? i let the matter rest for a while, wondering how to draft an email summarizing all the numbers i'd gotten out of the tests.

and then recently, the uber-salesman began sending rather insistent emails about getting results from the tests so he could "optimize the configuration".

all right. shot a figure-filled email to him, right between the eyes. well... ...i didn't include any definite conclusions, so the matter remains open-ended.

and now, 10.4.5 has been released to the wild.

wonder if that'll do the farm any good.