Wednesday, June 22, 2016

Getting to grips with Docker

A while ago, I described how we took an existing application build script and managed to run it inside Docker.

Having played with this inside Docker a little more, it's probably worth scribbling down a few notes I happened to stumble across on the way.

I'm looking at having 2 basic images: as a foundation, Ubuntu with all the packages we want added; then an image that inherits FROM that with our application stack built and installed (but not configured). The idea behind this layering is simply to separate the underlying OS, which is fairly standard, from the unique stuff that is all ours.

Then, you create an instance image from the application image, simply by running a configuration script that you COPY in. Once you've got a configured application instance, you create a volume container from it, and then run the application image using the volume(s) from the instance image. You keep that volume container around, just as a home for your data, essentially forever. And you can run multiple application instances from the same base image, you just need to configure and create a volume container for each instance.

That's a brief overview of the workflow, now some tweaks and pitfalls.

We're using Ubuntu, so the first step is to run apt-get with our list of packages. This originally created a 965MB image. It's not going to be small, we need both java and a full development stack to create our application.

However, some of the stuff installed we'll never need. Using the --no-install-recommends flag to apt-get saved us about 150M. The recommends list is stuff that might be useful, but not essential. But remember - our Docker container is only ever going to run a fixed set of applications, so we'll never need any of the optional stuff. The only thing to be careful of here is if you accidentally depend on something in the recommends list without realizing you're only getting it indirectly.

We can do slightly better in terms of saving space. We use postgresql, but get it to store the database files in our own locations, so we can remove /var/lib/postgresql/9.X and what's underneath it, saving almost another 40M.

One thing to be aware of is that the list of packages in the official Ubuntu Docker image isn't quite the same as you would get from a regular Ubuntu install. There are one or two packages we didn't bother adding because they were there in a regular install that we need to add with Docker. Things like sudo and wget are on this list, so I needed to add those to the apt-get list.

Another thing to be aware of is that because you're building images afresh each time, you aren't guaranteed that new users will always get the same uid and gid. If you change the list of packages (even by just adding --no-install-recommends), this might change which users exist, and that affects the uid assigned to later users. I got burnt when a later base build ended up giving the postgres user a different uid, so it didn't own its database files on the persistent volume any more. I think the long term fix here is to create the users you need by hand before installing any packages, forcing the uid and gid to known values.

In order to keep image sizes small, you'll often see "rm -rf /var/lib/apt/lists/*" in a Dockerfile. In general, deleting temporary files is a good idea. This includes any files created by your own software deployment stage. Cleaning that up properly saved me another 200M or so in the final image. (Remember to clean up /tmp, that's part of the image too.)

It isn't strictly related to Docker, but I hit an ongoing problem - in some environments I ended up blocking on /dev/random. Search around and you'll find a lot of problems reported, especially related to java and SecureRandom (or, in our case, jruby). Running Docker on my Mac was fine, running it on a server in the cloud gave me 15-minute startup times. The solution here is to add or to your java startup (or JAVA_OPTIONS).

(And, by the way, this illustrates that while Docker can guarantee that your app is the same in all environments, it doesn't magically protect you from differences in the underlying environment that can have a massive impact on your application.)

My application listens on ports 8080 and 8443, which I map on the host to the common ports, with

docker run -p 80:8080 -p 443:8443 ...

This works fine for me in testing, when I'm only running one copy and simply point a browser at the host. Networking gets a whole lot more complicated with multiple containers, although I think something like a load-balancer in front might work.

I've been using the Docker for Mac beta for some of this - while at times it's been beta in terms of stability, generally I can say it's a very impressive piece of work.

Sunday, June 19, 2016

Data Destruction and illumos

When disposing of  a computer, you would like to be sure that it has no data on its storage that could be accessed by the direct recipient (or any future recipient). It would be somewhat embarrassing for personal photos to be retrieved; it would be far worse if financial or business data were to be left accessible.

The keywords you're looking for here are data remanence and disk sanitization.

There are three methods to remove data from a disk. Total physical destruction, degaussing, and overwriting the data. The effectiveness of these methods is up for debate; as is the feasibility of a sufficiently determined and well-funded attacker being able to retrieve data.

Here I'm just going to discuss overwriting the disk. For a lot of casual and home purposes that'll be enough, and is a lot better than not bothering at all, or simply reformatting the drive (or reinstalling on OS on it) which will leave a lot of disk sectors untouched and amenable to simply being read off in software.

The standard here seems to be DBAN. However, it's not seen much activity in a while, and was sold to a company that offers a commercial product that's claimed to be much better.

Basically, all DBAN is doing is scribbling over every sector on a drive. That's not hard.

In Solarish systems, format/analyze/purge does essentially the same thing. It's the documented method for wiping hard drives on Solarish style systems.

However, it's a little fiddly to use and requires a modest level of expertise to get that far. You can't purge the disk you're booted from, the solution proposed there is to boot from installation media, drop to a shell, and run format from there. That has a couple of problems - it's still very manual, and the install (or live) media are rather large and can take an age to boot.

So I started to think, how hard could it be to create a minimalist illumos boot media that just contains the format command, and a simple script around it to make it easy to run?

I've already done most of the work, as part of the minimal viable illumos project. It was pretty easy to create a new variant.

The idea is to erase disk drives, so the intended target is physical hardware rather than a hypervisor. So I added a number of common storage drivers to the image. (As an aside, I really have no idea as to what storage HBAs are actually in common use, so which drivers to put in this list or on the Tribblix install iso is largely guesswork.)

There should be no need for networking. You really don't want a mechanism for any external access to the system while the disks are being wiped, so networking is simply not there.

And I added a simple wrapper script that enumerates disk drives and runs the appropriate format commands. If you want to see how this works, just look at the wrapper script. All this is in the mvi repo, see the files with "wipe" in their names.

And there's the (14M in size) iso image I created also available.

(Why is such a small image good, you might ask? Apart from simply being sure that it's only capable of doing the one function that it's advertised for, if you're trying to wipe a remote system mounting the image over the network, then the smaller the better.)

I tested this in VirtualBox, which exposed a few quirks. For one, the defect list switching you'll see in the docs doesn't work there (I have no idea if it's going to work on any real hardware). The other is that the disk image I was using was a file on a compressed zfs file system. The purge process writes a repeating pattern, which is very compressible, so the 1G disk image I was testing only takes up 16M of disk space.

While I don't think it's really a proper alternative to DBAN, I think it's useful as a real-world example of how to use mvi.

Thursday, June 16, 2016

Connecting to legacy Sun ILOM with modern clients

The bane of many a system administrator's existence is the remote management capability on their servers. In Sun's case, I'm talking about the ILOM.

(Of course, Sun have had RSC and ALOM and eLOM and maybe some other abominations over time.)

Now, for many purposes, you can just ssh to the ILOM and you're done. On Sun boxes anyway, where you often have serial console redirection and the OS using the serial console.

However, if you want to manage the system fully, you need a proper client. There are a couple of common cases. First, if you need the VGA console (either for a broken OS, or to interact with the BIOS), or if you want to do storage redirection (in other words, you want to remotely present a bootable image).

That's where the fun starts, and you get in a tangled relationship with Java. Often, it ends up being a tale of woe.

And that's on the best of days. With legacy hardware - such as the X4150 - it gets a whole lot more interesting.

Now, while the X4150 is legacy and well past end of life now, it turns out that there was an updated firmware release in 2015. (For POODLE, I think.) If you can, apply this, as it should fix some of the UI compatibility issues with newer browsers. (Not all, I suspect, but if you've tried using a current browser and only got half the GUI then you know what I'm talking about.)

However, that doesn't necessarily mean that the Java application is going to work. There are actually a couple of issues here.

The first is that the application is a signed jar, and the certificate used to sign it has expired. Worse, due to Java's rather chequered security history, current versions have draconian checks in place which you'll run into. To fix, go to the Java Control Panel, down to "Perform signed code verification checks" and change it to "Do not check". Generally, disabling security like this is a bad idea, but in this case it's necessary.

Next, if you start up the application, click through the remaining security dialogs, and try to connect to the console, you'll get a cipher suite mismatch failure. The ILOM is pretty old, and uses SSLv3 which is disabled by default in current Java. You'll need to edit the file (in ${JAVA_HOME}/jre/lib/security/[*]) and comment out two lines - the ones with jdk.certpath.disabledAlgorithms and jdk.tls.disabledAlgorithms, then run the application again.

With luck, that will at least enable you to get to the console.

If you want storage redirection, then you're in for more fun. For starters, you need to be running Solaris, Linux, or Windows. If you're on a Mac, it's not going to work. You'll need to get yourself another machine, or run a VM with something else installed.

And the other thing is that you need to be running a 32-bit Java Virtual Machine. If you're running Solaris, this rules out Java 8 - you'll have to go back to Java 7. On other platforms, you'll have to make sure you have a 32-bit JVM, which might not be the default and you might have to manually install it.

Oh, and if you're on Linux or Solaris and running OpenJDK (rather than the Oracle builds) then you'll need IcedTea to get the javaws integration. At least with IcedTea you can ignore the Java Control Panel stuff.

[*: On my Mac I discovered that I had 2 different installations of Java. The one that you get if you type "java" isn't the same one used for browser integration and javaws launching. Running /usr/libexec/java_home gave me the wrong one; I ended up looking at the ps output when running the Control Panel to find out the location of the one I really needed.]

Monday, June 06, 2016

What's present in libc on illumos?

Over time, operating systems such as illumos gain new functionality. For examples, new functions are added to libc. But how do you know what's there, when was it added, and whether a newer version contains something that's missing?

So perhaps the first place to start is with elfdump. If you run

elfdump -v /lib/

then you'll get a bunch of lines like so:

     index  version                     dependency
       [1]                                        [ BASE ]
       [2]  ILLUMOS_0.13                ILLUMOS_0.12        
       [3]  ILLUMOS_0.12                ILLUMOS_0.11        
       [4]  ILLUMOS_0.11                ILLUMOS_0.10        

      [51]  SUNW_0.8                    SUNW_0.7            
      [52]  SUNW_0.7                    SYSVABI_1.3         
      [53]  SYSVABI_1.3                                     
      [54]  SUNWprivate_1.1                                 

So, these lines correspond to the various released versions of libc. Every time you add a new function to libc, that means a new version of the library. And each version depends on the one before.

These versions are listed in a mapfile, at usr/src/lib/libc/port/mapfile-vers in the illumos source. So, what you can see here is that ILLUMOS_0.13 (which is the version shipped with Tribblix 0m16) is where eventfd got added. If you want strerror_l (which you do if you're building vlc), then you need ILLUMOS_0.14; if you want pthread_attr_get_np (which you need for QT5) then you need ILLUMOS_0.21. (Unfortunately, Tribblix 0m17 only picks up ILLUMOS_0.19.) Looking back, you can see everything exposed by libc and what version of Solaris or illumos it was added in.

Another trick is to run elfdump against a binary. For example:

elfdump -v /usr/gnu/bin/tar

will tell us which versions of which libraries are required. Part of this is:

Version Needed Section:  .SUNW_version
     index  file                        version
       [2]                 SUNW_0.7            
       [3]                   ILLUMOS_0.8         
       [4]                              ILLUMOS_0.1          [ INFO ]
       [5]                              SUNW_1.23            [ INFO ]
       [6]                              SUNW_1.22.6          [ INFO ]

This is quite informative. It tells you that  gtar calls the newlocale stuff from ILLUMOS_0.8, and a bunch of older stuff. But the point here is that it is calling illumos additions to libc, so this binary won't work on Solaris 10 (and probably not on Solaris 11 either). If you're building binaries for distribution across distros, you can use this information to confirm that you haven't accidentally pulled in functions that might not be available everywhere.

Wednesday, May 18, 2016

Installing Tribblix into an existing pool

The normal installation method for Tribblix is the script.

This creates a ZFS pool for installation into, creates file systems, copies the OS, adds packages, makes a few customizations, installs the bootloader, and not much else. It's designed to be simple.

(There's an alternative, to install to a UFS file system. Not much used, but it's kept just to ensure no insidiuous dependencies creep into the regular installer, and is useful for people with older underpowered systems.)

However, if you've already got an illumos distro installed, then you might already have a ZFS pool, and it might also have some useful data you would rather not wipe everything out. Is there not a way to create a brand new boot environment in the existing pool and install Tribblix to that, preserving all your data?

(Remember, also, that  ZFS encourages the separation of OS and data. So you should be able to replace the OS without disturbing the data.)

As of Tribblix Milestone 17, this will work. Booting from the ISO and logging in, you'll find a script called in root's home directory. You can use that instead of, like so:

./ -B rpool kitchen-sink

You have to give it the name of the existing bootable pool, usually rpool. It will do a couple of sanity checks to be sure this pool is suitable, but will then create a new BE there and install to that.

Arguments after the pool name are overlays, specifying what software to install, just like the regular install.

It will update grub for you, so that you have a grub on the pool that is compatible with the version of illumos you've just added. With -B, it will update the MBR as well.

It copies some files, the minimum that define the system's identity, from the existing bootable system into the new BE. This basically copies across user accounts (group, passwd, shadow files) and the system's ssh keys, but nothing else.

Any existing zfs file systems are untouched, and will be present in the new system. You'll have to import any additional zfs pools, though.

When you boot up after this, the grub menu will just contain the new BE you just created. However, any old boot environments are still present, so you can still see them, and manipulate them, using beadm. In particular, you can mount up and old BE (in case there are important files you need to get back), and activate an old BE so you can boot into the old system if so desired.

Amongst other things, you can use this as a recovery tool, when your existing system has a functioning root pool but won't boot.

I've also used this to "upgrade" older Tribblix systems. While there is an upgrade mechanism, this really only works (a) for very recent releases, and (b) to update one release at a time. With this new mechanism, I can simply stick a new copy of Milestone 17 on an old box, and enjoy the new version while having all my data intact.

Tuesday, May 17, 2016

Updating Tribblix for SPARC

Having just released an updated version of Tribblix for x86, a little commentary on the status of the SPARC version is probably in order.

After a rather extended delay, there is an updated version of Tribblix for SPARC available for download.

This is Milestone 16, so it's at the same level as the prior x86 release. More precisely, it's built from exactly the same illumos source as the x86 Milestone 16 release was. Yes, this means that it's a bit dated, but it's consistent, and there have been a number of breaking changes for SPARC builds introduced (and fixed) in the meantime.

I did want to get a release out of the door, before bumping the version to 0m17 ready for the next x86 build. Part of this was simply to ensure that I could actually still build a release - it gets tested on x86 all the time, but as it happens I had never built a SPARC release on my current infrastructure.

In terms of additional packages, the selection is still rather sparse. I haven't had the time to build up the full list. This isn't helped by the fact that my SPARC kit is rather slow, so building anything for SPARC simply takes longer. Much longer. (Not to mention the fact that they're noisy and power-hungry.)

It's not just time that's the problem. I've had some difficulty building certain packages. I'm not talking the likes of Go and Node, which I can simply ignore as they're not ported to SPARC at all, but some reasonably common packages would fail with obscure (and unexpected) build errors. If there's a problem with one component, that blocks anything dependent on it too.

Other than expanding the breadth of available packages, the key next steps are (i) to make Tribblix on SPARC self-hosting, like the x86 version has been for a very long time, and (ii) to try and keep the SPARC release more closely aligned so it doesn't drift away from the x86 release and require additional effort to bring back in sync.

Signing Packages in Tribblix

On any computer system you want to know exactly what software is installed and running.

Tribblix uses SVR4 packaging, so you can easily see what's installed. In addition, there are mechanisms - pkgchk - to compare what's on the disk with what the packaging system thinks should be there. But that's just a consistency check, it doesn't verify that the package installed is actually the one you wanted.

Tribblix has had simple integrity checking for a while. The catalog for a package repository includes both the expected size and the md5 checksum of a package. This is largely aimed at dealing with download errors - network drops, application errors, or errant intrusion detection systems mangling the data. In practice, because the downloaded packages are actually zip files, which have inbuilt consistency checking and the catalog at the end of the file, and because SVR4 packaging has its own consistency checks on package contents, the chances of a faulty download getting installed are remote, the checking is so that the layer above can make smart decisions in the case of failure.

But you want to be sure that, not only has the package you downloaded made it across the network intact, but that the source package is legitimate. So the packages are signed using gnupg, and will be verified upon download in upcoming releases. Initially this is just a warning check while the mechanisms get sorted out.

The actual signing and verification part is the easy bit, it's all the framework around it that takes the time to write and test.

One possibility would have been to sign the package catalogs, and use that to prove that the checksum is correct. That's not enough, for a couple of reasons. First, the catalog only includes current package versions, so there would be no way to verify prior versions. Second, there's no reason somebody (or me) couldn't take a subset of packages and create a new repo using them; the modified catalog couldn't be verified. In either case, you need to be able to verify individual packages. (But the package catalog should also be signed, of course.)

It turns out there's not much of a performance hit. Downloads are a little slower, because there's an extra request to get the detached signature, but it's a tiny change overall.

With this in place, you can be sure that whatever you install on Tribblix is legitimate. But all you're doing is verifying the packages at download time. This leaves open the problem of being able to go to a system and ask whether the installed files are legitimate. Yes, there's pkgchk, but there's no validated source of information for it to use as a reference - the contents file is updated with every packaging operation, so it clearly can't be signed by me each time.

This is likely to require the additional creation of a signed manifest for each package. This partially exists already, as the pkgmap fragments for each package are saved (in the global zone, anyway), and those could be signed (as they don't change) and used as the input to pkgchk. However, the checksums in the pkgmap and contents files aren't particularly strong (to put it mildly), so that file will need to be replaced by something with much stronger checksums.

Initial support for signed packages is available starting with the Tribblix Milestone 17 release. At this point, it will check the package signatures, but not act on them, enforcement will probably come in the next release when I can be reasonably sure that everything is actually working correctly.

Saturday, April 02, 2016

Minimal illumos networking

Playing around with Minimal Viable Illumos, it becomes clear how complex a system illumos is. Normally, if you're running a full system, everything is taken care of for you. In particular, SMF starts all the right services for you. Looking through the method scripts, there's a lot of magic happening under the hood.

If you start with a bare install, booted to a shell, with no startup scripts having run, you can then bring up networking automatically.

The simplest approach would be to use ifconfig to set up a network interface. For example

ifconfig e1000g0 plumb up

This won't work. It will plumb the interface, but not assign the address. To get the address assigned, you need to have ipmgmtd running.

env SMF_FMRI=svc/net/ip:d /lib/inet/ipmgmtd

Note that SMF_FMRI is set. This is because ipmgmtd expects to be run under SMF, and checks for SMF_FMRI being set as the determinant. It doesn't matter what the value is, it just needs to exist.

OK, so if you know your network device and your address, that's pretty much it.

If you use dladm to look for interfaces, then you'll see nothing. In order for dladm show-phys to enumerate the available interfaces, you need to start up dlmgmtd and give it a poke

env SMF_FMRI=svc/net/dl:d /sbin/dlmgmtd
dladm init-phys

Note that SMF_FMRI is set here, just like for ipmgmtd.

At this point, dladm show-phys will give you back the list of interfaces as normal.

You don't actually need to run dladm init-phys. If you know you have a given interface, plumbing it will poke enough of the machinery to make it show up in the list.

If you don't know your address, then you might think of using dhcp to assign it. This turns out to require a bit more work.

The first thing you need to do is bring up the loopback.

ifconfig lo0 plumb up
ifconfig lo0 inet6 plumb ::1 up

This is required because dhcpagent wants to bind to localhost:4999, so you need the loopback set up for it to be able to do that. (You don't necessarily need the IPv6 version, that's for completeness.)


ifconfig e1000g0 plumb
ifconfig e1000g0 auto-dhcp primary

Ought to work. Sometimes it does, sometimes it doesn't.

I sometimes found that I needed to

dladm init-secobj

in order to quieten errors that occasionally appear, and sometimes need to run

ifconfig -adD4 auto-revarp netmask + broadcast + up

although I can't even imagine what that would do to help, I presume that it kicks enough of the machinery to make sure everything is properly initialized.

Within the context of mvi, setting a static IP address means you need to craete a new image for each IP address, which isn't actually too bad (creating mvi images is really quick), but isn't sustainable in the large. DHCP would give it an address, but then you need to track what address a given instance has been allocated. Is there a way of doing static IP by poking the values from outside?

A tantalizing hint is present at the end of /lib/svc/method/net-physical, where it talks about boot properties being passed in. The implementation there appears to be aimed at Xen, I'll have to investigate if it's possible to script variables into the boot menu.