Desktop search for Linux – autumn 2006

Tux
I already wrote a bit about desktop search on Linux systems, you can find more in the section Desktop Search (surprise, surprise).

However, the last look at the situation in general is quite some months ago, and several things have developed since then.

The desktop old bulls

First of all the more or less desktop specific developments: at the moment KDE is concentrating on developing a new desktop search system around strigi (with including the results from the Nepomuk project) for KDE 4, while the current GNOME ships with beagle.
However, both projects try to cover the needs of both desktops – beagle addresses the KDE destkop with kerry, and strigi has asked the GNOME folks for comments about a possible cooperation. So, to keep it right: both would disagree if you would call them desktop specific!

Both projects are well maintained and are further developed and enhanced all along. Both released new versions this month covering bugfixes as well as new features.
The current advantages: since beagle is part of the current GNOME it is shipped with most distributions and is used by quite a lot of people. Strigi looks like becoming one of the major parts of the future KDE 4 search and information management system and therefore will support Nepomuk, which is a standard is this field.

The best known shortcomings of both are for beagle a huge (reported, so probably subjective) memory and CPU consumption so that it slows down the system, while strigi is at the moment not able to use the benefits of inotify without problems and is not part of the current distributions – you have to install it by hand.

Beyond that old bulls

Now, beyond the horizon of the current well known solutions:
There are currently two solutions (at least I know of) I already mentioned marginal somewhere in this blog which are interesting: Tracker and leaftag.

Leaftag became famous beacause it was the first solution providing tagging possibilities for the Linux desktop. The screenshots do look nice and go pretty much in a direction I would prefer – however, leaftag does not seem to index stuff by itself, you have to give away tags. This is a bit of a problem when you want to get an overview over a data storage with just too many files to tag by yourself. The even bigger problem however is that the changelog hasn’t seen anything new in a while. Also, I do not know any distribution shipping it yet.

Tracker is another attempt of creating a real indexing/search engine, this time together with tag support. The screenshots give again a nice first impression. Also the changelog gives a good idea of the ongoing development. Although coming straight from the GNOME world, Tracker tries to go the non-desktop-specific way over freedesktop.org, implementing specs from there, communicating by D-Bus, and so on. But, well, there wasn’t much KDE integration yet, it is more a GNOME project hosted on fd.o with no hard GNOME dependency for the main libraries at the moment… And, like strigi will do it with the help of Nepomuk, Tracker is able to understand and use RDF. Standard compliance is important, I think.
And the distribution support? Well, there is a specific support which brought me to this article: Ubuntu thinks about replacing beagle in the next release with Tracker. The idea behind is that Tracker comes along with more features and less shortcomings from Ubuntu’s point of view. It will be interesting to see what Kubuntu does when Ubuntu switches away from Beagle – maybe we will see strong KDE integration of tracker through that way, provided by the Kubuntu folks.

Even far more beyond

There are, of course, other projects around – but I haven’t heard of them much, or they are in an early stage. Here is a short summary about what I heard.

  • kat is dead! Nothing to argue about that. Actually there are not even any project pages provided at kde-apps anymore. Phew, I wonder what happened there…
  • GLScube seems to struggle if it should continue development on the current code base or should restart from scratch – so in any case, a very early stage of development.
  • Pinot is another crawler and says at least that it can handle tags. Haven’t heard much about that besides its homepage and someone mentioning it on a comment.
  • Other projects investigating the semantic search – there are various: Haystack, knowledgesearch, etc. However, out of all of them, only the already mentioned Nepomuk made real noise until now.

Conclusion

In any case, the landscape of search and index integration into the Linux desktop is boiling again – with Ubuntu one of the biggest distributions is thinking about dropping beagle, the most used solution for this task. With the strigi integration into KDE 4 it maybe becomes the new standard on all KDE machines – maybe. Exciting 🙂

11 thoughts on “Desktop search for Linux – autumn 2006”

  1. My experience so far:

    Strigi – (Using Kubuntu packages) it took several hours to index my selected resources into a 50 meg folder (making the PC unusable at the time as it was hogging all resources and spinning the disk wildly) and when complete only found files (e.g. no emails) and all the files that were found when searched where in my .beagle folder. There was no apparent way to disable it searching that folder and it never returned results from anywhere but in .beagle. Quite unusable at the moment.

    Beagle – (Using Kubuntu package for beagle and compiled stable release of KBeagleBar) Indexed everything in the background completely un-noticeably (into a 103 meg folder), I don’t know how long it took but as far as I could tell as it found the movie I searched for instantly and after watching that found everything AOK even emails and executable’s and occasionally Kopete conversations. No real system drain noticed but did once run away with system resources and needed to kill the daemon. Occasionally doesn’t find files that you know are there and returns searches in strange orders (you expect recently created files to appear at the top however it decides differently) – doesn’t self update quickly (indexes new files within the hour though).

    Update – apparently it is a lot faster when extended attributes are enabled on the filesystem (this doesn’t work for my FAT32 or Reiser4 partitions so doesn’t help me that much) also enabling extended attributes on a filesystem isn’t ideal for newbies (pain in the arse for me really as I had to RTFM). If you use JFS your in luck as it is automatically turned on. After testing didn’t make an appreciable difference anyhow…

    Tracker (compiled from stable release) – I only had the command line interface as there isn’t an interface for KDE AFAIK. I had to run the first index but it was done within a twenty minutes (and the PC still was usable during that period but it still takes a lot of disk time so don’t expect it to be un-noticeable) or so. It only finds files but uses the least resources of them all (Strigi seemed to consume the most) it always found what I wanted and much much faster than beagle (beagle is no slouch though and is quite usable) and updates to new files instantly but unfortunately is not practical without a GUI.

    I figure I might as well includes Google’s windows search client for the desktop. After taking over an hour to index a lot smaller diskpace than the linux clients had to was up and running. It’s even slower than beagle to find results however is completely accurate and integrates with Gmail (V nice) as it picks out emails from there instantly. Unfortunately the GUI comes with lots of bloat but if you have an hour to waste it can all be disabled.

    Verdict – Between Googles desktop search and Beagle it’s hard to find a champ they are both quite good but beagle occasional inacuracy makes Google the winner. Id say tracker shows the most promise, if it can get a GUI and integrate with Kmail then it will be great (integrate directly with Gmail then all the better so the mails will back date before I last rebuilt my PC). I’m sure technically Strigi is the best but way off the mark and seeing as both Beagle and strigi use derivatives of the Lucene search engine then they both should be equally capable but beagle works and Strigi doesn’t so for the time being appears to be far from usable.

  2. Interesting experiences – especially the fact that for you beagle was the fast, non cpu-killing one. But since you’ve said that you were unable to exclude certain folders I guess that you used a quite old version of strigi since that feature is included for quite some time know.
    Also, most people I know cannot share your experience that beagle was running in the background without system drain.

    The most confusing part of your report however is that you say beagle updated within an hour – I cannot explain that one! Beagle in all versions around these days supports inotify – as do all kernels around these days. So all changes are recorded by beagle instantly. Everything else must be a huge bug…?

  3. I can’t say my experiences as reported were scientifically rigorous by any means but they are unbiased and therefore have some worth however you might have found different results for your conditions.

    In light of your comment I tested Beagle again. Yes it does update instantly I created a text doc on the desktop and searching found it instantly. The reason I was confused was because it sometimes just does not find files (it had failed the text doc experiment previously using Kubuntu Edgy). I put a kwalletmanager generated file on there about 40 mins ago and have rebooted since however Beagle does not list it, quite peculiar must be a bug, I will log it as soon as I can find a reason otherwise my conditions can not reproduced else where easily and therefore the report would probably useless.

    As far as resource usage is concerned ‘beagled’ is using 5.0% of mem (that’s about 25 meg) I don’t think there are any extra libs taking extra mem so I think that figure is as all. It uses an average of about 1% mem but hovers between 0.0% and 5.6% so isn’t really that bad. (consider that Amarok is sitting idle using 9.7% of mem (50 Meg I guess))

    I would be interested to hear your experience with Strigi – I was using the strigi daemon (Clucene) 0.3.8 and the strigi kio-slave (yuck) and couldn’t find the option to stop it indexing certain folders. I may try it again but not until after the weekend.

  4. Don’t get my wrong, I do not doubt your experiences, it is just very interesting to see that they can be so different!
    About the exclusion: I rechecked, and I think it was more like you had to choose directories – and I chose all directories except the beagle directory… I added it as a feature wish to the strigi wiki.

  5. Just to clarify:

    Beagle wasn’t the lightest as that goes to Tracker (really nice, fast, accurate indexer). I considered writing a Gui for Tracker

    Oh when I said you couldn’t exclude directories what I meant was you select the directories that you want to index and then it goes and recurses through all the directories from that point and you are unable to exclude directories from that recursion process.

    I didn’t think that you doubted my experiences.

  6. Wo there presumptuous one, I am tempted though but you must remember I haven’t coded in six or so years (excluding this useful tool http://www.kde-look.org/content/show.php?content=13738 Warning:It may screw up depending on resolutions, monitors fonts etc I’ve only tested it on my environment).

    Seeing as it works through DBus which in turn has python bindings will make things a lot simpler (I don’t mind the white space adherence that so many complain about).

    Documentation seems a little sparse (http://www.gnome.org/~jamiemcc/tracker/documentation.html seems to be down) and I can’t find any documentation on the Python bindings to DBus only the lower level C stuff (any one know of any documentation) but it’s all stuff I would like to learn about.

    I’ll sleep on that one before deciding to screw over more of my social/family time to computing, but I am really tempted.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.