Monday, October 31, 2005

Invalid system disk

I was updating Solaris on my W2100z the other night. After applying the latest recommended patches, I rebooted and went off to get a drink.

On my return I was rather startled to find displayed the old message "Invalid system disk. Press any key to continue." Oh bother - what's gone wrong?

It would be a bit of a pain, because I do quite a bit of development on this machine and I really don't want to lose it. The usual suspect in these cases would be a floppy disk, but the W2100z doesn't have one, so I was worried for a second that the main disk had been trashed.

But then I worked out what had happened. I had taken the recommended patch bundle home on a zip disk, and left it connected. So the invalid disk it was complaining about was in fact the one I had left in the USB zip drive. (So clearly the W2100z is capable of booting off a USB device. Scary.)

Panic over...

Thursday, October 20, 2005

Putting Google to work

I don't know how many others use Google as a problem solving tool, but I use it all the time.

I had a problem yesterday, jumpstarting Solaris 10 onto a workstation. Couldn't find the jumpstart directory. So plan A is to drop the error into Google and see what we get.

Bingo! The right answer comes straight back. Yup, simple netmask mismatch.

It's not just that, though. I was intrigued to see that Sun use google for making Solaris development decisions.

Heck, what would we do without Google?

Sunday, October 16, 2005

system() doesn't constitute an API

I like APIs. I like being able to call a function, give it some data, and tell it what to do with it, and have it do it without fuss or bother.

And I like interfaces that are stable and reliable.

However, on a typical unix-like system, certain common operations have no real API. For some operations, the normal way to do things is to run an external program.

Usually the system() call of the title is involved, but it could be popen() or some other variant. I'm using system() here as shorthand terminology for executing another program to do the work.

The two standard examples are mail and printing. You throw the data you want something done with at sendmail or lp.

This is atrocious. This isn't an API - this is merely convention that often works. Frankly, it worries me that there aren't standard programming APIs for such common tasks as sending mail and printing a file.

While mail appears to be standardized sufficiently to be largely hidden by applications, printing is another matter. The variety (and general disfunctionality) of print dialog boxes should be a clue here. Worst, of course, are those that simply have a text field that you type the print command into.

Apart from the inefficiency of launching external programs, the lack of genuine APIs limits the interactions a program can have, in particular the feedback it can get from the application it launched.

It's the 21st century. We deserve to have some decent 21st century APIs.

Friday, October 14, 2005

When did that get added?

I haven't used format to partition a disk by hand under Solaris for ages. Normally I set everything up at install time using jumpstart, and then prtvtoc and fmthard if I need to copy a partition table (if a disk fails and needs to be replaced, for example). So I happened to be working on a machine today, and was presented with:


partition> 4
Part Tag Flag Cylinders Size Blocks
4 unassigned wm 0 0 (0/0/0) 0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 14521
Enter partition size[0b, 0c, 14521e, 0.00mb, 0.00gb]:


Say what? What's this e thing? Well, it's what you expect - the end cylinder. Aha! My, this makes putting that last partition in so much easier. No more trying to calculate in your head how many cylinders are left...

(Looking at a couple of other machines, I think this got added in Solaris 9.)

Why don't people upgrade?

Gary mentions a topic I've always found interesting: why do people persist in running old versions of Solaris?

I know of many reasons, some of them are even valid. Most of these are in the general area of supported configurations. If a vendor (or even Sun) won't support a new version of Solaris, then you're pretty well out of luck. It's not just official support, either - sometimes the product plain won't work.

(This last point makes me wonder. Given the very strong binary compatibility guarantees in Solaris, what are some of these vendors doing to make their code stop working? Presumably they think they're being clever, but they must be putting in a lot of effort to break things.)

There are also a couple of burying head in the sand excuses I hear
  • nobody else has upgraded so there must be a problem

or
  • we'll wait for someone else to try it and find all the bug

both of which are bogus. Let's nail the first one for starters. Actually, plenty of people do upgrade, and don't have problems (if they read the instructions, that is). Do you want to be the one left behind? This herd instinct does seem very strong, though.

The "waiting for someone else" mentality is wrong. For one thing, Solaris doesn't go round randomly breaking functionality - compatibility is very strong and the level of bugs is very low. Secondly, the Solaris development model fixes bugs in the new version first and then fixes the older versions later (if at all), so that upgrading to a new version is the best way to reduce the number of bugs that you might be exposed to. Thirdly, it's actually already been tested (to destruction and beyond) very very hard indeed already. I know, I did it. Yes, we found bugs. But a whole lot of people inside and outside of Sun have beaten on Solaris for a long time, and most of the bugs are gone. Finally, because it's been well tested, the remaining bugs tend to be in strange distant areas that nobody has covered yet. In other words, the chances are pretty good that if you're going to get affected by a bug, it may well be specific to you, your environment, and the way you use the product - and these bugs aren't going to be picked up by other people, so it's daft to wait for other people to do the testing because it doesn't mean anything for you.

There's also the argument "it's not broken, so there's no point fixing it". Sure, it is reasonable not to immediately go out and heedlessly upgrade every time a new version comes out. And once something works leaving it alone is generally a good policy. (But I think this really ought to be expressed as "don't fiddle" rather than "never touch".) But regular maintenance is an essential part of the process, and upgrades and replacements need to be planned for. The problem with just leaving things alone is that they start to rot, and if you leave them too long you have a disaster on your hands. For example, if you don't apply any patches for 4 years, and then hit a problem, you have to apply 4 years of patches - a massive change, and a massive risk. Or if you insist on running old hardware too long, you suddenly discover that when you need to replace it you have to change the OS and application by 3 or 4 major versions and it becomes a nightmare. Essentially, planned steady change is a lot better than hanging on too long and hoping you can survive the inevitable wreck. Better many small steps than a desperate leap over a gaping chasm.

I haven't even considered all the cool new features.

So I have this message to all those still hesitating about upgrading to Solaris 10:

Come on in - the water's fine!