Phonon, Gstreamer and ALSA

Yesterday we had the release of the Phonon homepage. That brought up some news in different online media which reported over that fact, and among other’s there were an article on the german pro-linux.de.

In the discussion about this article it became clear pretty soon that there are two things which bother the users about Phonon:
* first: some of them do not know how Phonon differs from ALSA or for what you need it
* second: some users complain that KDE does not simply take GStremaer as default

Here is an explanation about these topics, together with some general things on how Sound under Linux works. The video-part is not really addressed, although I sometimes talk about “multimedia”, if you have comments how to extend it, feel free to post them.

First, there is the hardware – different sound cards, different sound hardware, and therefore different hardware drivers which are needed. To avoid the problem of implementing each driver for each piece of hardware into every program we need to abstract the hardware drivers into one kind of “standard driver” which can be accessed by all other programs. This abstraction layer can than implement all other hardware drivers and can talk to the hardware.
This abstracted hardware can accept all audio streams and commands which would go directly to the hardware in other cases. To keep it easy the audio streams are only accepted as raw audio data streams. The abstraction layer should not nbe bothered with the different types of music formats.
This task is done by OSS or by ALSA where ALSA is the standard for Linux and OSS is still used in Unix versions like BSD. The advantage of ALSA is in this case that it can not only accept one audio raw data stream, but several, and mix them together. So different applications can talk to ALSA and send them streams at the same time.

But, as I said, ALSA does only take raw data streams. Why? Well, imagine the amount of different audio data formats, the way how they could be handled (stream, file, etc.) and so on: MP3, OGG, WAV, ACC, WMA, RMA, etc. ALSA was designed as an hardware abstraction, and to keep a project living and developing it makes sense to not put all problems into one solution. Keep it simple, you might call it. There are enough problems you have to fight with when you are designing a hardware abstraction layer.

So we come to the second step: the multimedia framework. This framework is the part which can understand all the different media formats like the one’s above mentioned. Therefore several people state it should be able to handle some kind of plugins to provide an easy way of integrating new file formats.
Common multimedia frameworks are GStreamer, NMM, Xine, Helix and also the old aRts from KDE 3. Since OSS is not able to mix multiple sound streams some of these frameworks can overtake this job and therefore are also called soundserver. One of the reasons aRts was very popular in the beginning was that it addressed this problem very well.

The next step now is the third – the wrapper. Imagine you are programming in a specific framework like the KDE framework: you are using your normal programming language (here c++), you are using APIs which you are used to (qt/KDE APIs), and you are just used to the style and the way of how it works in this environment and how problems are addressed, etc.
Then it makes pretty much sense to provide oyu with a convenient set of APIs and your favoured language if I want you to implement a new general feautre.
This feature is now multimedia, or more specific, sound. You have to implement some kind of multimedia framework into your program – but this is most likely not written especially for your desktop environment with APIs in the way you are used to, but it is written with it’s own APIs, in it’s own way to address different problems, and probably even written in it’s own programming language.
That is exactly the situation we have with GStreamer and KDE!
So there is the need to program a wrapper which provide the KDE developers which something they are used to – and here we have it: Phonon. That task is done by it. Phonon provides the APIs the KDE people can easily integrate into their programs and can take usage of without learning to much new stuff. Keep it simple, developers, after all, are also just human beings.
So you need to have this wrapper, no matter if you want to tightly integrate GStreamer and nothing else into KDE or not.

Before we have a close look at this last sentence, we just talk about the fourth and last step of the architecture of sound inside Linux: the applications. These should integrate something to send their sound to as easy as possible.
If you are a KDE-used developer, you should use the APIs which are provided by KDE, so in future which are provided by Phonon. If you are a developer of professional audio software you should directly integrate ALSA support into your application, probably with an option to switch to OSS (although I do not think that a professional sound application would be satisfied with OSS).
Gnome developers will certainly implement GStreamer at the moment, but who knows what the future will come up with? NMM looks very, very promising, and even Helix seems to come up with some interesting stuff in the near future.

But back to Phonon – I already explained why we have to have Phonon, no matter if we integrate GStreamer as tight as possible or not. And now have a look at the history of KDE: KDE already thought once they have found the holy grail with aRts, and it was very painful to learn that it wasn’t. Additionally even today GStreamer is not supported by everyone, there are enough people who prefer Xine, aRts or Helix, and the distributors also have somehting to say. And do not forget that it is possible that something new step up suddenly and provide an astonishing new multimedia framework with functions every user have never dared to dream of.
Another thought is binary incompatibility: even if GStremaer will be the preferred solution for the next years, GStreamer will develop also. And there will probably a point where they brake the binary compatibility between two versions. With Phonon that wouldn’t be just a small correction (well, a new Phonon-backend probably), but nothing to worry about. The same is true for all other backends.
With these thoughts in background and the fact that there had to be a wrapper in all cases the developers decided to make this extra effort to be able to switch between the different solutions. It is also a nice way to keep some downwards compatibility with KDE 3 since there is also the ability to support aRts with Phonon.
It just keeps KDE flexbile and still gives the opportunity to support GStreamer as the main solution.

So, if you think GStreamer is the best ever: well, don’t complain, GStreamer can be fully integrated into KDE, and can be the default backend. You would like to add now that it makes sense to support only one backend since there are different funcitons in different backends, and you cannot support them all, or you cannot provide function x with backend y and the other way around to the application developers and therefore you can only provide limited functions of each backend and cannot use the full abilities of the backends. That’s right in theorie, but in practise Phonon and most multimedia frameworks are aiming at the same target: the normal user. And the normal user does only have limited needs. As mentioned above: if you want to program a high professional audio application, nothing stands in your way – but you should as close to the hardware as you could, therefore you shouldn’t use a multimedia framework but should work directly with ALSA.

So far, I hope that I cleared some questions and calmed down some stir. Spread the word/a link to this post, and show that Phonon is not as bad as several people think – the opposite will hopefully be the case🙂

6 thoughts on “Phonon, Gstreamer and ALSA”

  1. Hi liquidat,
    thanks a lot for all your efforts of getting the correct message out. I highly appreciate that. Your article is really good (though there are a few terms mixed up here and there) and the overall content is correct and to the point.

  2. Good to hear that it is ok, that does good. It’s the fewest I can do to help you – and it is a win win situation: when I deal with the people who have the usual questions and doubts you have more time for programming and I will get an even better framework😉

    And feel free to mention everything which is mixed up or just not clear enough, I would like to (learn and) correct it. You can also contact me on jabber or e-mail.

    Btw.: If you need any help with your web page or creating such texts I would like to help or provide texts also.

  3. hum, how about /usr/bin/play (part of the sox package)?
    Isn’t that the simplest multimedia framework you can find? (and after all the issues with arts, I would
    definitely check the option of a simple and fast multimedia framework…. for example almost all might want
    MP3, OGG, WAV, decoding, but ACC, WMA, RMA might be something for which not any user might want to use lots
    of memory Mb, then sox package would be just perfect, IMHO)….

  4. About sox:
    First, it is no multimedia framework but an audio framework. The video part is completly missing afaik, therefore it does not fit to the needs of KDE. Second, it would need again a Qt/KDE wrapper to provide KDE style APIs, so we are back at Phonon. And third, the question is how easily sox can be extended with plugins for other file formats (new audio formats for example for VoIP, like farsight; other audio formats for special applications, like science programs, etc.).
    But, if you want to have /usr/bin/play: the Qt/KDE wrapper must be there for sox also,and it gives you additionally the opportunity to write a wrapper for sox – write it, and you can use sox as your preferred framework (without video capabilities…).

  5. Hm, I do not like that comment:
    First, the author does not point out where he takes his data from – if from his own computer he better make some real statistics. Sure, gstremaer is broken here as well, but that does not make me cliam that it is broken everywhere in every case.
    Second, the fact that there is a business modell behind developed software does not make the software bad – than Linux would be bad by default, so would be Gnome and KDE.
    Third, DRM is something very nasty, but blaming some free software developers is not the way it should be criticized. And the gstreamer people are not stealing – they are using. If you don’t want someone to use your code in a signed-only environment, use another licence, but no Free Software licence.
    Fourth, the critics about Phonon are dumb: Phonon is working already, even the API is close to stable in these days, and yes, there is some support of a company – but since when is company support bad? Show me one bigger software project which does not has companies as community members. And about the DRM stuff in Phonon: the poster should provide evidence, then there should be a much more detailed look.

    And, last but not least, fifth: the whole posting is ranting only – there is no single suggestion how to do it better or what to do instead of, and all the rantings are not supported by any facts. And simple ranting postings are neither helpful nor interesting – they are just noise.

    But anyhow, that is against the one who posted the linked comment, not against you – so thanks for the suggestion, I always appreciate feedback🙂

Comments are closed.