29 March 2008

Debian packages

One of the great things about GNU/Linux is that you can install almost any program with two or three commands. Here is how it works on Ubuntu:

  • apt-cache search burn cd (search for applications to burn CDs)
  • apt-cache show nautilus-cd-burner burn k3b (get details on a few applications that look promising)
  • apt-get install k3b (install the one you like most)

That's it: No need to browse malware-ridden download sites. A typical distribution comes with almost 30000 packaged applications. Let's see a few details of how it works.

Ubuntu uses the Debian package system. Packages are described by meta-data (located in /var/lib/apt/lists on Ubuntu). A package is identified by a name and a version:

Package: bash
Version: 3.2-0ubuntu11

(Versions are totally ordered and the comparison algorithm is a pain.)

A package can be installed if and only if it is available and certain other packages are installed.

Pre-Depends: libc6 (>= 2.6-1), libncurses5 (>= 5.6)

The notation above is CNF in disguise: Any of the mentioned versions would do. In fact, one can use a pipe (|) to explicitly say `or'. (A dumb feature of this DSL is that both < and <= mean `at most'. To say `less than' you have to use <<.)

A package is non-broken only if it is installed and certain other packages are non-broken.

Depends: base-files (>= 2.1.12), debianutils (>= 2.15)

A package is non-broken only if certain other packages are not installed.

Conflicts: bash-completion

(A missing version means `any version'.) If the last two conditions are both met then the package is non-broken.

Finally, virtual packages are not explicitly listed and only have a name. In fact, it's easier to think of it as having a special version, say ::VIRTUAL. They are provided by (normal) packages.

Provides: c-compiler

And, by the way, there is an exception to the semantics I presented. If a package x declares that it conflicts with package y (given only by name), which is a package that x also provides or is xitself, then it really means that x conflicts with all the other packages that provide y. Sneaky.

A distribution is a set of available packages. If there is no sequence of package installation that results in package x being non-broken then package x does not fit in the distribution. Such a situation is a bug (probably in the meta-data) and we want to detect it. How? By making things simpler. The stuff above makes my head spin.

Exercise 1. Are there packages that don't fit in the following distribution?

Package: a
Version: 1
Conflicts: a

Package: b
Version: 1
Depends: a (= 1)
Provides: a

Exercise 2. Same question.

Package: a
Version: 1
Conflicts: a

Package: b
Version: 1
Depends: a
Provides: a

Exercise 3. Same question.

Package: a
Version: 2
Conflicts: a (<= 2)

Package: b
Version: 1
Depends: a (= 2)
Provides: a

Do we need to look at sequences of installations or just at subsets of installed packages? Well, packages that are part of a pre-depends cycle cannot be installed, while packages part of a depends cycle may be installable. If there is no pre-depends cycle then it's safe to look only at subsets of packages, consider `installable' and `non-breakable' as equivalent, and consider `pre-depends' and `depends' as equivalent. That's a big simplification and I'm not even 100% convinced that it's correct.

The information in meta-data can be expressed as boolean propositions. For a package x we abuse notation and consider x to be a boolean variable, true if and only if the package x is non-broken/installed.

  • x → φ ∧ ψ if φ is the CNF after pre-depends and ψ is the CNF after depends
  • xy1 ∨ ⋯ ∨ yn if y1,…,yn are all the packages that provide x
  • x → ¬ y if packages x and y have the same name but different versions
  • x → ¬ y if package x conflicts with package y, which (has a version specification) or (has a different name than x and its provided packages)
  • x → ¬ z if package x conflicts with package y, which has no version specified and has the same name as package x or one of the packages provided by package x; the package z is a package that provides y and is distinct from package x

Please let me know if you think the semantics given here conflict with (the intended meaning of) the Debian documentation.

Once everything is expressed in boolean form we can simply ask a SAT solver whether a package doesn't fit in a distribution.

Further info. A lot of this stuff is covered in an article on maintaining package dependencies. That article is accompanied by many interesting tools, such as edos-debcheck and anla. In case you prefer Python to English here is a hack that reads in package meta-data and outputs CNF boolean formulas (almost) in the DIMACS format. The result is unambiguous; the translation is not perfect.

No comments:

Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.