Archive for July, 2010

A Tale of Three Dependency Generators

Tuesday, July 6th, 2010

Back when the world was young and life was simple, RPM’s automatic dependency generation during package build consisted essentially of two scripts: find-requires and find-provides, which got passed the entire list of files in package. These scripts did whatever they deemed necessary, including calling other scripts and tools to figure out things like library soname dependencies from the build root contents, and output them back to rpmbuild on per-package level. It had its limitations, but it was relatively easy to customize for anybody knowing a little bit of shell scripting.

Then in late 2002, something called “internal dependency generator”, a mixture of C-implementation (hence the “internal”) with some helper scripts,  was born to replace the old way, which has since then been known as the “external dependency generator”. I can only speculate on the design decisions behind the this “new” dependency generator, but AFAIK one of the motivations for the internal dependency generator was to enable file “coloring”, which is an abstraction RPM uses to figure how to install (conflicting) 32 and 64-bit binaries on multilib systems. The file type dependent “color” needs to be determined and recorded on per-file basis, so it was not possible to directly hook it to the external generator whose output was on per-package basis.

Whatever the motivations, the new design was problematic in numerous ways: there was no way to filter the dependencies generated by the C-implementation, adding new dependency types required modifying the C-code, various cross-build issues were impossible to address, assumption that file classification can be solely done by heuristics on file contents and the ultimate show-stopper of recent times: the number of possible file types was limited by 32-bit bitfield, which had gotten all used up with no room for expansion and obviously completely unsuitable for application specific dependencies to begin with. These problems have driven most distributions to switch their default rpmbuild configuration to use the old external dependency generator, which has been provided as a compatibility fall-back option all this time. Which in turn has lead to endless confusion about filtering and otherwise modifying the automatic dependency generation, and mysteriously misbehaving packages on distributions where the file coloring is required for correct multilib behavior (notably Fedora, RHEL + their derivates).

It’s been a long time coming, but a couple of months ago HEAD gained what could be considered as the third-generation dependency generator, replacing the former “internal” generator and designed to address all the issues it had, with plenty of room for future growth. The new generator is entirely driven by attaching abstract attributes to files via configurable and spec-overridable regular expression rulesets. Files can have arbitrary number of attributes, and the attributes determine which dependency generator helpers get executed on it. New attributes can be added simply by placing attribute rule files into a directory. This  means RPM itself doesn’t need any modification in order to introduce new dependency types. It also means the attribute rules and the scripts they invoke can live in the related packages in the cases where it makes sense – for example generating dependencies for Ruby might be best done by a script written in Ruby, but this used to be problematic as it would’ve introduced a Ruby dependency to rpm-build.

Perhaps an example or two might be in order. Bear in mind this is not yet available in any released RPM version, and so details are still subject to change. So consider this as just a sneak preview of things to come, hopefully later this year.

Case 1: ELF library and executable dependencies

Where the former “internal” dependency generator had the ELF dependency generation buried inside librpmbuild with no way to affect its output, ELF is now just another attribute files may have:

%__elf_provides     %{_rpmconfigdir}/elfdeps --provides
%__elf_requires     %{_rpmconfigdir}/elfdeps --requires
%__elf_magic        ^ELF (32|64)-bit.*$
%__elf_flags        exeonly

This means ELF requires and provides generation can be filtered (by redefining the macros to use a custom filter script), or completely overridden: for example it would be possible to compile and use a special elfdeps helper for cross-builds.

Case 2: GStreamer dependencies

For a few releases now, Fedora’s RPM has generated special provides for GStreamer plugins to aid automatic multimedia plugin installation. This has required a small but conceptually very ugly patch to rpm-build plus various klunky bits and pieces in gstreamer-devel package, as there was no way to cleanly tell RPM to do something special about files in a certain directory. With the new system, this becomes trivial and clean. GStreamer plugins are ELF shared objects and they all require gstreamer-devel to build, so it makes sense to put all the knowledge about these highly application specific plugin dependencies there. To declare a “gstreamer” file attribute, gstreamer-devel drops something like this into /usr/lib/rpm/fileattrs/gstreamer.attr file:

%__gstreamer_provides    %{_rpmconfigdir}/gstreamer.prov
%__gstreamer_path        ^%{_libdir}/gstreamer-.*/.*\.so$

And this is all it takes to get rpmbuild to run the GStreamer provide generator for files matching the pattern, without RPM knowing anything about GStreamer.

Case 3: Printer driver dependencies

RPM in Fedora >= 13  generates special provides for CUPS drivers to enable automatic on-demand installation of printer drivers. This involves some gross hackery to get around the file type assumptions of the internal generator – some of these drivers are ELF executables, some are plaintext .drv files and some are PPD files, scattered here and there in the filesystem, and only the PPD files being reliably detectable as such by their contents. With the new generator, the nasty patches can be replaced with a “psdriver” attribute rule (in this case using both a path-based pattern and a libmagic identification pattern):

%__psdriver_provides    %{_rpmconfigdir}/postscriptdriver.prov %{buildroot}
%__psdriver_path        ^(/usr/lib/cups/driver/.*|%{_datadir}/cups/drv/.*\.drv)$
%__psdriver_magic       ^PPD File.*$

Put the rule and the script into eg. cups-devel, and the application specific dependencies are in the hands of the application maintainer who should be best positioned for dealing with things like possible changes in driver directory layout requiring rule updates, fixing bugs and improving the driver provides script etc. RPM should not, and now does not need to, have to know anything about printer drivers.