Thursday, July 28, 2011

A bigger hammer than svcadm

One advantage of the service management facility (SMF) in Solaris is that you can treat services as single units: you can use svcadm to control all the processes associated with a service in one command. This makes dealing with problems a lot easier than trying to grope through ps output trying to kill the right processes.

Sometimes, though, that's not enough. If your system is under real stress (think swapping itself to death with thousands of rogue processes) then you can find that svcadm disable or svcadm restart simply don't take.

So, what to to? It's actually quite easy, once you know what you're looking for.

The first step is to get details of the service using svcs with the -v switch. For example:

# svcs -v server:rogue
STATE NSTATE STIME CTID FMRI
online - 10:06:05 7226124svc:/network/server:rogue
(you'll notice a presentation bug here). The important thing is the number in the CTID column. This is the contract ID. You can then use pkill with the -c switch to send a signal to every process in that process contract, which defines the boundaries of the service. So:

pkill -9 -c 7226124
and they will all go. And then, of course SMF will neatly restart your service automatically for you.

(Why use -9 rather than a friendlier signal? In this case, it was because I just wanted the processes to die. Asking then nicely involves swapping them back in, which would take forever.)

No comments: