Archive for the ‘Fedora’ Category

A Tale of Three Dependency Generators

Tuesday, July 6th, 2010

Back when the world was young and life was simple, RPM’s automatic dependency generation during package build consisted essentially of two scripts: find-requires and find-provides, which got passed the entire list of files in package. These scripts did whatever they deemed necessary, including calling other scripts and tools to figure out things like library soname dependencies from the build root contents, and output them back to rpmbuild on per-package level. It had its limitations, but it was relatively easy to customize for anybody knowing a little bit of shell scripting.

Then in late 2002, something called “internal dependency generator”, a mixture of C-implementation (hence the “internal”) with some helper scripts,  was born to replace the old way, which has since then been known as the “external dependency generator”. I can only speculate on the design decisions behind the this “new” dependency generator, but AFAIK one of the motivations for the internal dependency generator was to enable file “coloring”, which is an abstraction RPM uses to figure how to install (conflicting) 32 and 64-bit binaries on multilib systems. The file type dependent “color” needs to be determined and recorded on per-file basis, so it was not possible to directly hook it to the external generator whose output was on per-package basis.

Whatever the motivations, the new design was problematic in numerous ways: there was no way to filter the dependencies generated by the C-implementation, adding new dependency types required modifying the C-code, various cross-build issues were impossible to address, assumption that file classification can be solely done by heuristics on file contents and the ultimate show-stopper of recent times: the number of possible file types was limited by 32-bit bitfield, which had gotten all used up with no room for expansion and obviously completely unsuitable for application specific dependencies to begin with. These problems have driven most distributions to switch their default rpmbuild configuration to use the old external dependency generator, which has been provided as a compatibility fall-back option all this time. Which in turn has lead to endless confusion about filtering and otherwise modifying the automatic dependency generation, and mysteriously misbehaving packages on distributions where the file coloring is required for correct multilib behavior (notably Fedora, RHEL + their derivates).

It’s been a long time coming, but a couple of months ago rpm.org HEAD gained what could be considered as the third-generation dependency generator, replacing the former “internal” generator and designed to address all the issues it had, with plenty of room for future growth. The new generator is entirely driven by attaching abstract attributes to files via configurable and spec-overridable regular expression rulesets. Files can have arbitrary number of attributes, and the attributes determine which dependency generator helpers get executed on it. New attributes can be added simply by placing attribute rule files into a directory. This  means RPM itself doesn’t need any modification in order to introduce new dependency types. It also means the attribute rules and the scripts they invoke can live in the related packages in the cases where it makes sense – for example generating dependencies for Ruby might be best done by a script written in Ruby, but this used to be problematic as it would’ve introduced a Ruby dependency to rpm-build.

Perhaps an example or two might be in order. Bear in mind this is not yet available in any released RPM version, and so details are still subject to change. So consider this as just a sneak preview of things to come, hopefully later this year.

Case 1: ELF library and executable dependencies

Where the former “internal” dependency generator had the ELF dependency generation buried inside librpmbuild with no way to affect its output, ELF is now just another attribute files may have:

%__elf_provides     %{_rpmconfigdir}/elfdeps --provides
%__elf_requires     %{_rpmconfigdir}/elfdeps --requires
%__elf_magic        ^ELF (32|64)-bit.*$
%__elf_flags        exeonly

This means ELF requires and provides generation can be filtered (by redefining the macros to use a custom filter script), or completely overridden: for example it would be possible to compile and use a special elfdeps helper for cross-builds.

Case 2: GStreamer dependencies

For a few releases now, Fedora’s RPM has generated special provides for GStreamer plugins to aid automatic multimedia plugin installation. This has required a small but conceptually very ugly patch to rpm-build plus various klunky bits and pieces in gstreamer-devel package, as there was no way to cleanly tell RPM to do something special about files in a certain directory. With the new system, this becomes trivial and clean. GStreamer plugins are ELF shared objects and they all require gstreamer-devel to build, so it makes sense to put all the knowledge about these highly application specific plugin dependencies there. To declare a “gstreamer” file attribute, gstreamer-devel drops something like this into /usr/lib/rpm/fileattrs/gstreamer.attr file:

%__gstreamer_provides    %{_rpmconfigdir}/gstreamer.prov
%__gstreamer_path        ^%{_libdir}/gstreamer-.*/.*\.so$

And this is all it takes to get rpmbuild to run the GStreamer provide generator for files matching the pattern, without RPM knowing anything about GStreamer.

Case 3: Printer driver dependencies

RPM in Fedora >= 13  generates special provides for CUPS drivers to enable automatic on-demand installation of printer drivers. This involves some gross hackery to get around the file type assumptions of the internal generator – some of these drivers are ELF executables, some are plaintext .drv files and some are PPD files, scattered here and there in the filesystem, and only the PPD files being reliably detectable as such by their contents. With the new generator, the nasty patches can be replaced with a “psdriver” attribute rule (in this case using both a path-based pattern and a libmagic identification pattern):

%__psdriver_provides    %{_rpmconfigdir}/postscriptdriver.prov %{buildroot}
%__psdriver_path        ^(/usr/lib/cups/driver/.*|%{_datadir}/cups/drv/.*\.drv)$
%__psdriver_magic       ^PPD File.*$

Put the rule and the script into eg. cups-devel, and the application specific dependencies are in the hands of the application maintainer who should be best positioned for dealing with things like possible changes in driver directory layout requiring rule updates, fixing bugs and improving the driver provides script etc. RPM should not, and now does not need to, have to know anything about printer drivers.

Some RPM performance stats

Tuesday, June 23rd, 2009

It’s awfully (or should I say thankfully) easy to forget how bad things were just a year ago when you live on the git-snapshot-of-the-day edge… I recently had to give some, hmm, palliative care to RPM 4.4.x and was just astounded by how slow it really was. To get an idea how things have progressed during the last year, below is a comparison of RPM 4.4.x, 4.6.x, 4.7.x and git HEAD performing a test install of 2776 packages from Fedora 10 DVD to an empty chroot on my T60. The time is best of three consecutive runs for each version, so the packages are hot in cache:

RPM 4.4.x

[root@localhost rpm-4.4.x]# time ./rpmi -ivh –test –root /home/test/ –nosignature ~pmatilai/tmp/f10-fixed-all.txt
Preparing…                ########################################### [100%]

real    3m5.113s
user    3m3.250s
sys    0m1.202s

RPM 4.6.X

[root@localhost rpm-4.6.x]# time ./rpm -ivh –test –root /home/test/ –nosignature ~pmatilai/tmp/f10-fixed-all.txt
Preparing…                ########################################### [100%]

real    0m22.203s
user    0m21.225s
sys    0m0.955s

RPM 4.7.x

[root@localhost rpm-4.7.x]# time ./rpm -ivh –test –root /home/test/ –nosignature ~pmatilai/tmp/f10-fixed-all.txt
Preparing…                ########################################### [100%]

real    0m21.674s
user    0m20.649s
sys    0m0.890s

RPM.org HEAD

[root@localhost rpm]# time ./rpm -ivh –test –root /home/test/ –nosignature ~pmatilai/tmp/f10-fixed-all.txt
Preparing…                ########################################### [100%]

real    0m7.552s
user    0m6.648s
sys    0m0.847s

We’ve come from 3min 5sec to ~7 seconds for the same operation, no wonder 4.4.x felt like a snail trashing about in a tar pit. It should be noted that the above relatively simple case of installing into an empty chroot doesn’t do justice to 4.7.x at all – to see the effects of the improvements there, one needs to look at upgrade/erasure cases and associated memory use. Lets try an erase of the same package set as above, with a small patch to print out memory use at end of “Preparing…” stage:

RPM 4.6.x

[root@localhost rpm-4.6.x]# time ./rpm -e –nosignature –test –root /home/test/ `rpm -qa –root /home/test/`
VmRSS:      156208 kB

real    5m13.749s
user    4m10.267s
sys    1m2.776s

RPM 4.7.x

[root@localhost rpm-4.7.x]# time ./rpm -e –nosignature –test –root /home/test/ `rpm -qa –root /home/test/`
VmRSS:       88220 kB

real    0m14.224s
user    0m12.242s
sys    0m1.928s

RPM.org HEAD

[root@localhost rpm]# time ./rpm -e –nosignature –test –root /home/test/ `rpm -qa –root /home/test/`
VmRSS:       88156 kB

real    0m14.837s
user    0m12.846s
sys    0m1.937s

From 5min 13s down to ~14 seconds for the erase case, with memory usage almost halved. In case you wonder why HEAD is a bit slower than 4.7.x: it’s because erasures are now properly ordered too, that has some overhead. I’m not bothering to compare 4.4.x here as it’d be practically the same as 4.6.x behavior, except being even worse due to crazy memory fragmentation which strikes in some corner cases. Also the above case doesn’t come even close to triggering the worst case behavior of 4.4.x and 4.6.x where it can easily eat gigabytes of memory, which can also mean it’s impossible to erase (not to mention upgrade) a package which you installed.

In case you’re curious where the speed advantages come from, it’s all about switching to hash tables where appropriate and in some cases improving existing hashing functions. Everything else is just minor details, which are starting to show up now that we’re talking about a few seconds instead of minutes. And to give credit where credit is due, the most performance work in rpm by far was done by Florian Festi, cheers 🙂

News from the RPM pits

Wednesday, April 2nd, 2008

Father, it’s been a while since I last blogged…

Doesn’t mean nothing has been going on, not at all. Yesterday RPM 4.4.2.3 maintenance update was released with a fairly long list of bugfixes. The changelog has the details but various longstanding issues such as spaces in filenames, incorrect return codes from queries, “rpmbuild -bb –target <otherarch>” putting libraries to wrong directories on multilib systems etc have been fixed finally. This is the version that will be in Fedora 9, and it’s already in Rawhide.

Oh, I fully agree – maintenance updates to RPM 4.4.x aren’t particularly exciting. Maintenance releases of stable software are not supposed to be exciting! RPM isn’t the kind of software distros update lightly, it’s the very corner stone of RPM-based distributions: if it breaks, everything stops. So getting bugfixes to stable branch delivered in a predictable, no-surprises manner is of utmost importance. Exciting it is not.

Doesn’t mean there aren’t any exciting news though. RPM has just recently gotten more developer manpower behind it: Florian Festi and Jindrich Novy (both from Red Hat) are now working on RPM with me more or less full-time. This should give a nice boost to RPM development, cheers to Florian and Jindrich and my boss Denise who made this happen!

So, where’s the next major release lurking? Starting to be visible in the horizon, I would say. We’re still largely just clearing up ancient messes in the codebase – tedious, hard and largely thankless job of auditing and fixing the codebase for things like type- and const-correctness (rpm is/was full of code deliberately freeing data declared const for example), eliminating potential buffer overflows, splitting 1000+ line functions into smaller verifiable pieces, rewriting obscure code to be more readable, removing nasty surprises from the API and so on. Sexy new features – assuming one thinks there are such things in the world of RPM in the first place – are few and far between at the moment, we’ve been mostly removing “features” that don’t belong to RPM or are simply unsupportable. We want to make the ground solid before starting building anything new on it, even if it takes more time than one would like.

The time for actual new development work is closing in though, especially now with more folks actively working on RPM. Many things are in design phase, some already started like brand new Python bindings:

Rpm-python bindings are getting a thorough rewrite + redesign. The new bindings are being developed outside rpm proper and will be differently namespaced and parallel-installable to the current ones to allow for easy transition, the old in-rpm bindings will be maintained as long as necessary. While this does mean some extra (duplicated) work, it’s IMO the only way to get rid of various misdesigns, bad API’s and limitations of the current bindings without breaking the entire package build + management stack at once. The new bindings will be a mixture of Python and C with only low-level interface to librpm written in C, so if you want to hack on RPM stuff but only know Python, this is no problem anymore.

Other things that we’re looking at / going to look at include (but by no means limited to) multilib/arch dependencies, generic dependency filtering and custom dependency extractor mechanisms, redesigning the ancient transaction callback interface for extensibility, transaction performance and memory consumption issues, better API for header data handling … etc. The list is rather endless, and not all of it is going to happen now, or even this year. What I want to say here is that we’re not just sitting still on RPM 4.4.2.x, we’re working hard on delivering a solid new release, your patience is appreciated. 😉

RPM 4.4.2.2 RC1

Tuesday, August 28th, 2007

It’s only been little over a month since 4.4.2.1 was released, and I certainly didn’t expect to put out a new maintenance release out this soon but… the list of fixes has already grown long enough for it to make sense. RPM hasn’t exactly been known for quick bugfix delivery times, can’t hurt at least to try to change that by releasing more often – as often as it takes. So today I put out the first release candidate of RPM 4.4.2.2 maintenance release.

Ralf Corsépius has been doing a major cleaning up the cluttered and twisted make-system of rpm and also developing an automake-based test-system in the development tree. Meanwhile, I’ve been mostly trying to stay out of his way to minimize disruption and been hunting through bugzilla for existing patches which have been gathering dust there for ages; those fixes make up a large part of 4.4.2.2-rc1 contents.

As a result of these activities and getting RPM 4.4.2.1 delivered to FC6 and F7 as an update, the RPM bug-count for Fedora in has gone down considerably. It’s still way too high (over 90 open bugs) but that’s roughly 33% lower than it was in May. It might be nothing to write home about, at least progress is being made and folks getting their bugs fixed.

Oh and naturally the RC is soon to be found in Fedora Development repositories near you, it’s already packaged up and built and waiting just for repository push.

YUM and CTRL-C

Wednesday, July 25th, 2007

To clear up any false expectations: Rahul, contrary to what you wrote, RPM 4.4.2.1 doesn’t “fix the YUM CTRL-C issue”. If you read carefully what I said, that work has been done in rpm.org development branch, and will almost certainly be backported to the 4.4.x branch in the next maintenance release.

That said, the relevant patches are already included in Fedora Development RPM and thus will be in Fedora 8 starting from test1. But the RPM part is just an enabler, additionally YUM needs to make use of the new rpm-python methods before CTRL-C (and other normal exit signals) start working as expected. Initial patch to do this has just gone into YUM upstream but by the looks of things, F8-test1 wont yet have it as test1 has just been frozen.

Update: Seth says that thanks to a hiccup in the test spin he was able to slip the patch into test1 after all, excellent…