Thursday, September 03, 2015

Tribblix Graphical Login

Up to this point, login to Tribblix has been very traditional. The system boots to a console login, you enter your username and password, and then start your graphical desktop in whatever manner you choose.

That's reasonable for old-timers such as myself, but we can do better. The question is how to do that.

OpenSolaris, and thus OpenIndiana, have used gdm, from GNOME. I don't have GNOME, and don't wish to be forever locked in dependency hell, so that's not really an option for me.

There's always xdm, but it's still very primitive. I might be retro, but I'm also in favour of style and bling.

I had a good long look at LightDM, and managed to get that ported and running a while back. (And part of that work helped get it into XStreamOS.) However, LightDM is a moving target, it's evolving off in other directions, and it's quite a complicated beast. As a result, while I did manage to get it to work, I was never happy enough to enable it.

I've gone back to SLiM, which used to be hosted at BerliOS. The current source appears to be here. It has the advantage of being very simple, with minimal dependencies.

I made a few modification and customizations, and have it working pretty well. As upstream doesn't seem terribly active, and some of my changes are pretty specific, I decided to fork the code, my repo is here.

Apart from the basic business of making it compile correctly, I've put in a working configuration file, and added an SMF manifest.

SLiM doesn't have a very good mechanism for selecting what environment you get when you log in. By default it will execute your .xinitrc (and fail horribly if you don't have one). There is a mechanism where it can look in /usr/share/xsessions for .desktop files, and you can use F1 to switch between them, but there's no way currently to filter that list, or tell it what order to show then in, or have a default. So I switched that bit off.

I already have a mechanism in Tribblix to select the desktop environment, called tribblix-session. This allows you to use the setxsession and setvncsession commands to define which session you want to run, either in regular X (via the .xinitrc file) or using VNC. So my SLiM login calls a script that hooks into and cooperates with that, and then falls back on some sensible defaults - Xfce, MATE, WindowMaker, or - if all else fails - twm.

It's been working pretty well so far. It can also do automatic login for a given user, and there are magic logins for special purposes (console, halt, and reboot, with the root password).

Now what I need is a personalized theme.

Wednesday, September 02, 2015

The 32-bit dilemma

Should illumos, and the distributions based on it - such as Tribblix - continue to support 32-bit hardware?

(Note that this is about the kernel and 32-bit hardware, I'm not aware of any good cause to start dropping 32-bit applications and libraries.)

There are many good reasons to go 64-bit. Here are a few:

  • Support for 32-bit SPARC hardware never existed (Sun dropped it with Solaris 10, before OpenSolaris)
  • Most new hardware is 64-bit, new 32-bit systems are very rare
  • Even "old" x86 hardware is now 64-bit
  • DragonFly BSD went 64-bit
  • Solaris 11 dropped 32-bit
  • SmartOS is 64-bit only
  • Applications - or runtimes such as GO and Java 8 - are starting to only exist in 64-bit versions
  • We're maintaining, building, and shipping code that effectively nobody uses, and is therefore effectively untested
  • The time_t problem (traditional 32-bit time runs out in 2038)

So, I know I'm retro and all, but it's getting very hard to justify keeping 32-bit support.

Going to a model where we just support 64-bit hardware has other advantages:

  • It makes SPARC and x86 equivalent
  • We can make userland 64-bit by default
  • ...which makes solving the time_t problem easier
  • ...and any remaining issues with large files and 64-bit inode numbers go away
  • We can avoid isaexec
  • At least on x86, 64-bit applications perform better
  • Eliminating 32-bit drivers and kernel makes packages and install images smaller

Are there any arguments for keeping 32-bit hardware support?

  • It's a possible differentiator - a feature we have that others don't. On the other hand, if the potential additional user base is zero then it makes no difference
  • There is still some existence of 32-bit hardware (mainly in embedded contexts)
Generally, though, the main argument against ripping out 32-bit hardware support is that it's going to be a certain amount of work, and the illumos project doesn't have that much in the way of spare resources, so the status quo persists.

My own plan for Tribblix was that once I had got to releasing version 1 then version 2 would drop 32-bit hardware support. (I don't need illumos to drop it, I can remove the support as I postprocess the illumos build and create packages.) As time goes on, I'm starting to wonder whether to drop 32-bit earlier.

Saturday, August 29, 2015

Tribblix meets MATE

One of the things I've been working on in Tribblix is to ensure that there's a good choice of desktop options. This varies from traditional window managers (all the way back to the old awm), to modern lightweight desktop environments.

The primary desktop environment (because it's the one I use myself most of the time) is Xfce, but I've had Enlightenment available as well. Recently, I've added MATE as an additional option.

OK, here's the obligatory screenshot:


While it's not quite as retro as some of the other desktop options, MATE has a similar philosophy to Tribblix - maintaining a traditional environment in a modern context. As a continuation of GNOME 2, I find it to have a familiar look and feel, but I also find it to be much less demanding both at build and run time. In addition, it's quite happy with older hardware or with VNC.

Building MATE on Tribblix was very simple. The dependencies it has are fairly straightforward - there aren't that many, and most of them you would tend to have anyway as part of a modern system.

To give a few hints, I needed to add dconf, a modern intltool, itstool, iso-codes, libcanberra, zenity, and libxklavier. Having downloaded the source tarballs, I built packages in this order (this isn't necessarily the strict dependency order, it was simply the most convenient):
  • mate-desktop
  • mate-icon-theme
  • eom (the image viewer)
  • caja (the file manager)
  • atril (the document viewer, disable libsecret)
  • engramap (the archive manager)
  • pluma (the text editor)
  • mate-menus
  • mateweather (is pretty massive)
  • mate-panel
  • mate-session
  • marco (the window manager)
  • mate-backgrounds
  • mate-themes (from 1.8)
  • libmatekbd
  • mate-settings-daemon
  • mate-control-center
The code is pretty clean, I needed a couple of fixes but overall very little needed to be changed for illumos.

The other thing I added was the murrine gtk2 theme engine. I had been getting odd warnings from applications for a while mentioning murrine, but MATE was competent enough to give me a meaningful warning.

I've been pretty impressed with MATE, it's a worthy addition to the available desktop environments, with a good philosophy and a clean implementation.

Monday, August 10, 2015

Whither open source?

According to the Free Software Definition, there are 4 essential freedoms:

  • The freedom to run the program as you wish, for any purpose (freedom 0).
  • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
  • The freedom to redistribute copies so you can help your neighbor (freedom 2).
  • The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

Access to the source code and an open-source license are necessary preconditions for software freedom, but not sufficient.

And, unfortunately, we are living in an era where it is becoming ever more difficult to exercise the freedoms listed above.

Consider freedom 0. In the past, essentially all free software ran perfectly well on essentially every hardware platform and operating system. At the present time, much of what claims to be open-source software is horrendously platform-specific - sometimes by ignorance (I don't expect every developer to be able to test on all platforms), but there's a disturbing trend of deliberately excluding non-preferred platforms.

There is increasing use of new languages and runtimes, which are often very restricted in terms of platform support. If you look at some of the languages like Node.JS, Go, and Rust, you'll see that they explicitly target the common hardware architectures (x86 and ARM), deliberately and consciously excluding other platforms. Add to that the trend for self-referential bootstrapping (where you need X to build X) and you can see other platforms frozen out entirely.

So, much of freedom 0 has been emasculated. What of freedom 1?

Yes, I might be able to look at the source code. (Although, in many cases, it is opaque and undocumented.) And I might be able to crack open an editor and type in a modification. But actually being able to use that modification is a whole different ball game.

Actually building software from source often enters you into a world of pain and frustration. Fighting your way through Dependency Hell, struggling with arcane and opaque build systems, becoming frustrated with the vagaries of the autotools (remember how the configure script works - it makes a bunch of random and unsubstantiated guesses about the state of your system, the ignores half the results, and often needs explicitly overriding, making a mockery of the "auto" part), only to discover that "works on my system" is almost a religion.

Current trends like Docker make this problem worse. Rather than having to pay lip-service to portability by having to deal with the vagaries of multiple distributions, authors can now restrict the target environment even more narrowly - "works in my docker image" is the new normal. (I've had some developers come out and say this explicitly.)

The conclusion: open-source software is becoming increasingly narrow and proprietary, denying users the freedoms they deserve.

Sunday, August 02, 2015

Blank Zones

I've been playing around with various zone configurations on Tribblix. This is going beyond the normal sparse-root, whole-root, partial-root, and various other installation types, into thinking about other ways you can actually use zones to run software.

One possibility is what I'm tentatively calling a Blank zone. That is, a zone that has nothing running. Or, more precisely, just has an init process but not the normal array of miscellaneous processes that get started up by SMF in a normal boot.

You might be tempted to use 'zoneadm ready' rather than 'zoneadm boot'. This doesn't work, as you can't get into the zone:

zlogin: login allowed only to running zones (test1 is 'ready').
So you do actually need to boot the zone.

Why not simply disable the SMF services you don't need? This is fine if you still want SMF and most of the services, but SMF itself is quite a beast, and the minimal set of service dependencies is both large and extremely complex. In practice, you end up running most things just to keep the SMF dependencies happy.

Now, SMF is started by init using the following line (I've trimmed the redirections) from /etc/inittab

smf::sysinit:/lib/svc/bin/svc.startd

OK, so all we have to do is delete this entry, and we just get init. Right? Wrong! It's not quite that simple. If you try this then you get a boot failure:

INIT: Absent svc.startd entry or bad contract template.  Not starting svc.startd.
Requesting maintenance mode

In practice, this isn't fatal - the zone is still running, but apart from wondering why it's behaving like this it would be nice to have the zone boot without errors.

Looking at the source for init, it soon becomes clear what's happening. The init process is now intimately aware of SMF, so essentially it knows that its only job is to get startd running, and startd will do all the work. However, it's clear from the code that it's only looking for the smf id in the first field. So my solution here is to replace startd with an infinite sleep.

smf::sysinit:/usr/bin/sleep Inf

(As an aside, this led to illumos bug 6019, as the manpage for sleep(1) isn't correct. Using 'sleep infinite' as the manpage suggests led to other failures.)

Then, the zone boots up, and the process tree looks like this:

# ptree -z test1
10210 zsched
  10338 /sbin/init
    10343 /usr/bin/sleep Inf

To get into the zone, you just need to use zlogin. Without anything running, there aren't the normal daemons (like sshd) available for you to connect to. It's somewhat disconcerting to type 'netstat -a' and get nothing back.

For permanent services, you could run them from inittab (in the traditional way), or have an external system that creates the zones and uses zlogin to start the application. Of course, this means that you're responsible for any required system configuration and for getting any prerequisite services running.

In particular, this sort of trick works better with shared-IP zones, in which the network is configured from the global zone. With an exclusive-IP zone, all the networking would need to be set up inside the zone, and there's nothing running to do that for you.

Another thought I had was to use a replacement init. The downside to this is that the name of the init process is baked into the brand definition, so I would have to create a duplicate of each brand to run it like this. Just tweaking the inittab inside a zone is far more flexible.

It would be nice to have more flexibility. At the present time, I either have just init, or the whole of SMF. There's a whole range of potentially useful configurations between these extremes.

The other thing is to come up with a better name. Blank zone. Null zone. Something else?

Saturday, August 01, 2015

The lunacy of -Werror

First, a little history for those of you young enough not to have lived through perl. In the perl man page, there's a comment in the BUGS section that says:

The -w switch is not mandatory.

(The -w switch enables warnings about grotty code.) Unfortunately, many developers misunderstood this. They wrote their perl script, and then added the -w switch as though it was a magic bullet that fixed all the errors in your code, without even bothering to think about looking at the output it generated or - heaven forbid - actually fixing the problems. The result was that, with a CGI script, your apache error log was full of output that nobody ever read.

The correct approach, of course, is to develop with the -w switch, fix all the warnings it reports as part of development, and then turn it off. (Genuine errors will still be reported anyway, and you won't have to sift through garbage to find them, or worry about your service going down because the disk filled up.)

Move on a decade or two, and I'm starting to see a disturbing number of software packages being shipped that have -Werror in the default compilation flags. This almost always results in the build failing.

If you think about this for a moment, it should be obvious that enabling -Werror by default is a really dumb idea. There are two basic reasons:

  1. Warnings are horribly context sensitive. It's difficult enough to remove all the warnings given a single fully constrained environment. As soon as you start to vary the compiler version, the platform you're building on, or the versions of the (potentially many) prerequisites you're building against, getting accidental warnings is almost inevitable. (And you can't test against all possibilities, because some of those variations might not even exist at the point of software release.)
  2. The warnings are only meaningful to the original developer. The person who has downloaded the code and is trying to build it has no reason to be burdened by all the warnings, let alone be inconvenienced by unnecessary build failures.
To be clear, I'm not saying - at all - that the original developer shouldn't be using -Werror and fixing all the warnings (and you might want to enable it for your CI builds to be sure you catch regressions), but distributing code with it enabled is simply being rude to your users.

(Having a build target that generates a warning report that you can send back to the developer would be useful, though.)

Friday, July 24, 2015

boot2docker on Tribblix

Containers are the new hype, and Docker is the Poster Child. OK, I've been running containerized workloads on Solaris with zones for over a decade, so some of the ideas behind all this are good; I'm not so sure about the implementation.

The fact that there's a lot of buzz is unmistakeable, though. So being familiar with the technology can't be a bad idea.

I'm running Tribblix, so running Docker natively is just a little tricky. (Although if you actually wanted to do that, then Triton from Joyent is how to do it.)

But there's boot2docker, which allows you to run Docker on a machine - by spinning up a copy of VirtualBox for you and getting that to actually do the work. The next thought is obvious - if you can make that work on MacOS X or Windows, why not on any other OS that also supports VirtualBox?

So, off we go. First port of call is to get VirtualBox installed on Tribblix. It's an SVR4 package, so should be easy enough. Ah, but, it has special-case handling for various Solaris releases that cause it to derail quite badly on illumos.

Turns out that Jim Klimov has a patchset to fix this. It doesn't handle Tribblix (yet), but you can take the same idea - and the same instructions - to fix it here. Unpack the SUNWvbox package from datastream to filesystem format, edit the file SUNWvbox/root/opt/VirtualBox/vboxconfig.sh, replacing the lines

             # S11 without 'pkg'?? Something's wrong... bail.
             errorprint "Solaris $HOST_OS_MAJORVERSION detected without executable $BIN_PKG !? I are confused."
             exit 1

with

         # S11 without 'pkg'?? Likely an illumos variant
         HOST_OS_MINORVERSION="152"

and follow Jim's instructions for updating the pkgmap, then just pkgadd from the filesystem image.

Next, the boot2docker cli. I'm assuming you have go installed already - on Tribblix, "zap install go" will do the trick. Then, in a convenient new directory,

env GOPATH=`pwd` go get github.com/boot2docker/boot2docker-cli

That won't quite work as is. There are a couple of patches. The first is to the file src/github.com/boot2docker/boot2docker-cli/virtualbox/hostonlynet.go. Look for the CreateHostonlyNet() function, and replace

    out, err := vbmOut("hostonlyif", "create")
    if err != nil {
        return nil, err
    }


with

    out, err := vbmOut("hostonlyif", "create")
    if err != nil {
               // default to vboxnet0
        return &HostonlyNet{Name: "vboxnet0"}, nil
    }


The point here is that , on a Solaris platform, you always get a hostonly network - that's what vboxnet0 is - so you don't need to create one, and in fact the create option doesn't even exist so it errors out.

The second little patch is that the arguments to SSH don't quite match the SunSSH that comes with illumos, so we need to remove one of the arguments. In the file src/github.com/boot2docker/boot2docker-cli/util.go, look for DefaultSSHArgs and delete the line containing IdentitiesOnly=yes (which is the option SunSSH doesn't recognize).

Then you need to rebuild the project.

env GOPATH=`pwd` go clean github.com/boot2docker/boot2docker-cli
env GOPATH=`pwd` go build github.com/boot2docker/boot2docker-cli

Then you should be able to play around. First, download the base VM image it'll run:

./boot2docker-cli download

Configure VirtualBox

./boot2docker-cli init

Start the VM

./boot2docker-cli up

Log into it

./boot2docker-cli ssh

Once in the VM you can run docker commands (I'm doing it this way at the moment, rather than running a docker client on the host). For example

docker run hello-world

Or,

docker run -d -P --name web nginx
 
Shut the VM down

./boot2docker-cli down

While this is interesting, and reasonably functional, certainly to the level of being useful for testing, a sign of the churn in the current container world is that the boot2docker cli is deprecated in favour of Docker Machine, but building that looks to be rather more involved.