Monday, August 29, 2016

The Tribblix filesystem layout

On Solarish systems, the filesystem(5) manpage gives a good description of where in the directory tree you might find the various files associated with a piece of software.

The version in illumos is largely broken, in that many of the directories referenced make no sense at all for illumos itself, and are largely wrong for the various illumos distributions. In particular, some of the directories are very specific to the old Solaris Java Desktop System, or JDS, and relate to GNOME.

Now, how does Tribblix handle all this?

For anything inherited from illumos-gate, I simply put files wherever illumos-gate put them.

For anything I build and ship, I normally build with a --prefix of /usr. And, for most packages, that's the only thing I set. What this means is that for most packages, --sysconfdir is /usr/etc and --localstatedir would be /usr/var. I do not redirect --sysconfdir to /etc by default. In most cases I think I've done the right thing, to be honest, as often the files that would have been put into /etc aren't meaningfully editable in any case.

In those cases where the application does expect user-editable configuration, I will set --sysconfdir to /etc. This covers things like BIND, samba, cups, openssh, and the like.

Laying things out like this helps with things like sparse-root zones. I'm loopback-mounting /usr read-only, and that neatly catches everything (and ensures the parts of a package are consistent).

On the subject of zones, in a sparse-root zone /lib is inherited, which causes a problem. The SMF manifests and method scripts are now stored under /lib, and some are only relevant to the global zone. To handle this, I make a fixed copy of /lib for sparse-root zones to use, that doesn't have any errant SMF services present.

In order to be able to add my own services to zones, I make sure the manifests live under /var, which is unique to a zone.

I also handle /opt specially. According to filesystem(5), this is the "Root of a subtree for add-on application packages." The idea has always been that 3rd-parties pick a directory there and have that as their own dedicated prefix. (As an aside, I've always found the use of /etc/opt/foo and /var/opt/foo to be incredibly confusing, as it basically splatters the files associated with a given application all over the filesystem, making it very hard to keep track of things. Which is one of the reasons I just specify the prefix and put everything under the one root if I can.)

And what I do with /opt is mandate that it's not inherited by zones. Anything installed in /opt won't automatically be inherited by a zone. If you want it in a zone, you need to make sure it gets added there.

For my own applications designed for zones - particularly services, I put them under /opt/tribblix, so that an application foobar lives in /opt/tribblix/foobar, its configuration under /opt/tribblix/foobar/etc, and the like. Again, it's easier to see everything clearly if there's only one place to look. This layout makes it easy to run services in sparse-root zones, as the OS in /usr is read-only and the application never needs to touch that.

Modulo dependencies, anyway. That's a problem I haven't really solved, as some applications depend on packages that live in /usr, so I need some way to ensure that the right packages are installed in the global zone (or the zone template).

Solaris also had the notion of subsystems. For example, CDE (the dt subsystem) lived under /usr/dt, /var/dt, /etc/dt and the like. Again, I don't follow that. (Although there is the one exception which is that I install CDE under /usr/dt, because that's where it's always lived.) Most things are either generic (so live directly in /usr) or are  services that live under /opt/tribblix for zone support.

The exception to this are packages that live under /usr/versions in Tribblix. The main idea here is for things that might come in more than 1 version. For example, python 2 vs python 3. Or the various versions of Node.js or Java. Here the convention is that the application lives in a versioned directory under /usr/versions, allowing multiple versions of an application to coexist. (One or two things end up under /usr/versions even though there's no meaningful need to ever support multiple versions, when I need to put something in it's own directory hierarchy rather than directly in /usr, just to avoid having to create another standard location. Sort of like subsystems, but more tightly managed.) I'll generally put convenience links in the default path, although sometimes that involves picking a default version.

This all mirrors how I used to install software on Solaris 10 with zones many years ago. It's designed with zones in mind, and has been pretty sucessful.

Saturday, August 27, 2016

Updating desktop caches and a tale of woe

I recently updated some of the MATE components for Tribblix. On testing, various bits of MATE didn't work. Worse, various bits of Xfce didn't work.

The first issue that was fairly easy to solve was that MATE was looking for its menus under /etc/xdg/menus, whereas it had installed them under /usr/etc/xdg/menus. I had to set XDG_CONFIG_DIRS=/usr/etc/xdg in the MATE session startup scripts, and the menus reappeared.

Slightly trickier was that all the mime associations had stopped working. For everything - MATE, Xfce, and any other desktop that uses the shared desktop mime infrastructure.

There are various desktop caches that have to be kept up to date. Each cache is handled by an SMF service, so if you know a cache needs to be updated, then you just get a package to kick the service whenever it's installed or uninstalled. I had inherited these from OpenIndiana which got them from OpenSolaris, so they have gnome in the package name even though they're really more generic.

Each of these SMF services has a method script that follows the same pattern. First check to see if anything needs doing, then update the cache if necessary.

This logic is, in some cases, plain broken. There's a python script that does the check, and in most cases the check is much more expensive than actually updating the cache. For a couple of the services, I had already simply ignored the check and blindly done the update. In particular, for updating the icon cache, which is the most common case.

Another problem with this check is that if you add an old package or untar some old files, there's the possibility that the "new" files you've just added get ignored because they have older timestamps than the cache. This shouldn't be a problem because the search looks for ctime, but some things reset that as well.

This time, the mime caches got broken. This only happened because there's normally one package - shared-mime-info - providing the mime types. Nothing else updates it. Until MATE comes along.

I had a bit of a dig, and this python script turns out to have python2.4 in it's shebang. Oops! I've never shipped python older than 2.7, so this never worked. The method scripts redirect errors to /dev/null so they're never seen. The fact that this stuff worked at all was a complete accident, with the cache file being created the first time and never updated since.

There's a desktop mime cache, and that's pretty quick, so that was easy - just do the update without checking, and it's at least twice as quick.

The main mime cache is the one where the update itself is very expensive - it's almost 3s, which would be an eternity during every boot if it's done unconditionally. So for that I had to fix the python shebang and keep the logic intact.

As an aside, the package that delivers the SMF scripts and manifests hadn't been updated in ages, and I hadn't documented how it was built. Updating is easy - just take the existing package, modify, and repackage it. But I put together a desktop-cache repo on github to hold it so I don't have to go dumpster-diving next time.

While I was at it I discovered that the package dependencies weren't quite right (the SMF methods clearly depend on the packages supplying the binaries that update the caches), and the way I had put these into the Tribblix overlays wasn't quite right, so I sorted that out too.

All in all, a lot of effort to sort out something that shouldn't have been broken in the first place. Hence the tale of woe.

Oh, and there's a third bug I haven't yet tracked down. Sometimes the MATE file manager, caja, goes into a loop trying to open a png file under ${prefix}. Yes, the actual open() call includes a literal ${prefix}, so something hasn't been substituted in the code correctly.