Recently Sebastian Trüg held a presentation about Nepomuk-KDE and kindly provided me the slides. In this regard this post is an extension to the post State and Plans of Nepomuk-KDE.
More than just files
After the last post about Nepomuk-KDE many people discussed the pros and cons of a central storage of meta data. Additionally, alternative solutions like using file system capabilities (xarguments, etc.) where often mentioned and discussed.
However, most of this discussions failed to understand what meta data in the context of the semantic desktop are about: they are not only about files. Instead, the main goal of the Semantic Desktop idea is to gather all the data which cannot be connected to a single physical equivalent of a file. Think of bookmarks, e-mails (mbox format for example) and similar things. The other way around is also possible: projects often contain entire sets of files which all belong to one project. Also, in cases like address cards it doesn’t make sense to save the meta data of the file in the attributes of the physical file because in the end you would have to replicate the entire file content.
Soprano and Strigi
As already mentioned the meta data will be stored in a RDF storage – meet Soprano:
Soprano is a library which provides a QT wrapper API to different RDF storage solutions. It features named graphs (contexts) and has a modular plug-in structure which allows to use RDF backends implemented with different RDF Storage.
As the central meta data storage Soprano will be accessible to all applications through the KDE application framework
The storage itself will be filled in different ways. First, there is KMetaData: It provides easy to use functions for system developers to create and read meta data in storage. Think of applications where meta data are an essential part of the program: Digikam and Amarok are typical examples.
Second, strigi – KDE 4’s desktop search machine – will walk through the data available on the hard disk and will extract the file meta data as well as the content of the files (where it makes sense). As an example audio files often cary information about the artist in the meta data, while PDF files can contain meta data about the author but of course also the text in the PDF file itself.
Anyway, back to Nepomuk-KDE: to get a better picture how all the pieces like Nepomuk-KDE, Soprano, KMetaData and strigi work together Sebastian created a chart showing the different pieces:
Integrated into the programs this could look like this mockup:
This window with additional information about a sender of an e-mail contains more related information about the sender: some of information are directly aggregated from the contact data, like the e-mail address, the phone number and the web page, but there are also files displayed which the user has received by this contact, and you can see other people who are related to this person.
This is just a mockup but it gives a pretty good impression of what you can expect in the future – with much more to come.
KNepomuk, KMetaData and Nepomuk – about names
A last word about the naming: although I used the term KMetaData in this post (and particularly in the graphic), this name is actually not valid anymore: libnepomuk now contains KMetaData (together with Konto) and uses only one single namespace, “Nepomuk”. The old knepomuk became the new “Nepomuk::Middleware”.
The original plan was to change the name KMetaData to Braid, however this plan was dropped in favour of the restructuring and the fact that the term Nepomuk is already out there and pretty well known to the people.
22 thoughts on “More about Nepomuk-KDE: Soprano and KDE integration”
great explanation! thank you
Ah, thanks for explaining the bit about “Why not store metadata in files?” That makes a lot more sense now, since lots of metadata won’t be files. (IM contacts, etc.) But please, don’t create a situation where a single corruption of “~/.nepomuk.db” (or something similar) would cause irrevocable loss of all the metadata on a machine. Please create a system with some kind of redundancy.
I am very interested – and hesitantly excited – to see if this all works out. This could be a natural evolution of desktop search and the address book, or it could be a quantum leap that changes the way we interact with data. I don’t know. But I sure am intrigued.
Very cool idea. I look forward to seeing how it gets implemented. It seems that a number of different projects have rotated around a similar concept (the now dead WinFS, the Tenor Project), but have petered out on the actual implementation. From what you describe, you have workable code. Very, very cool.
This is great work. It will be exciting to have so much related information at one’s finger tips.
I am hoping that when this gets close to production we can find more friendly smilies than Semantic for the dialogs. Would Relationships or Related Information or something work better? A Plain English way of stating things will help and drive up adoption amongst users.
@Rob and the others:
Please keep in mind that I am the reporter, not the one who develops. If you are interested in more information or even participating the development, check the first article for the e-mail list of the project.
And about “workable code”: current KDE 4 svn versions already have a working Nepomuk implementation. It is very basic yet (tagging, rating, commenting), but it works.
Good work, this will be very usefull.
The recent Amarok work doesn’t seems to use Nepomuk yet, they refactorize there own metadata database for the new Amarok 2.0.
I wish they will use it soon !
Thanks for the explanation about metadata not strictly tied to files. But I’m still curious to know how nepomuk handles metadata that *is* tied strictly to a file, if the file is moved (possibly by a “dumb” command such as mv). Is it clever enough to notice the file has moved and point the existing metadata at the new file? What if a file is deleted – can it spot the “broken link” and automatically get rid of the corresponding metadata?
At the moment nothing is committed to the SVN which allows the easy movement or copying of files. However, I’m quite sure that this will change in the future.
If you want to know more details you have to ask on the development list – they are very open for feedback there, btw. 🙂
But… Say an app saves an MP3 file with the artist info, and it also stores the metadata in Soprano. Then, when strigi indexes the MP3 and finds the metadata, will it also store it in Soprano?! If the file metadata is constrained (e.g. it limits character set or string lengths) then the metadata might not match. You want the info in both the file and in the queryable storage, it’s tricky to keep in sync. Adrian, when the file is deleted I don’t necessarily want to lose the metadata. If the store remembers info about stuff even after you get rid of it it can serve as a journal or activity history.
kevincolyer, it’s all just “information about”, does prepending “Semantic” adds anything to usability?
Looking at the excellent Semantic info mockup for “Markus Mustermann”, for each object it mentions there are three “go to” actions: 1. View semantic info about that object (like a semantic browser), 2. Open the object, 3. View the object in context (open the file folder or mail folder or music collection that contains it). Hmmm.
well.. thx for the nice and very interesting article… i really wonder how they are going to solve the above mentioned problem with moving files.. and escepecially when you copy the file onto another pc (also with kde4)… what’s going to happen to the meta data… for example when you send someone some photos with comments.. are they going to be lost? and what about windows.. the comments are also lost or am I wrong? (will this be also platform independable like many parts of kde4?)
skierpage: First: since I’m not a developer you should not lay to much weight into my answers. If you are unsure you might also want to access the Nepomuk e-mail list.
However, about the MP3 file example: these meta data are, in the first place, stored in the file itself. MP3 files have a specific place for storing such data. However, just as with Amarok, every music application needs (!) a database for all these metadata – they simply copy all these data to a database to know which files they have. So in this case all meta data are in both, the storage backend and in the file.
About strigi: I think strigi will become intelligent in the way that realizes which files are already in the database and which not. But I’m not sure how far the development is yet regarding this topic.
LordBernhard, about copying files to other PCs: the Nepomuk project (the KDE integration is actually just a smaller part of it) addresses some questions and the long term aim is to provide a P2P solution for exchanging data together with their meta data.
If you just copy the files themselves, say, to a USB stick, the metadata is most likely lost (except maybe for files which already support meta data natively, like images or music files). At least that’s the current state.
About the knepomuk name: I guess you’re not using it any longer, but if you do I would like to suggest Kumopen.
Other article, other translation: I noticed a lot of people agree with the first one! If something goes wrong with this or the other, please, let me know!
furester: Wow, nice, thanks!
I will post links to them.
“Recently Sebastian Trüg held a presentation about Nepokumk-KDE”
>If the store remembers info about stuff even after you get rid of it it can serve as a journal or activity history.
sounds like self-incrimination to me!!