The Linux user is mainly an unknown species: since you can download your distribution anonymously and everywhere the distributors know almost nothing about their userbase. Due to bittorrent and ftp mirrors even the user numbers are rough estimations at best. However, with Popcon and Smolt two different approaches exist to gather more information about the user base.
Popcon is short for popularity contest, as in Debian Popularity Contest. The idea is simple: a small piece of software on the users computer gathers data about the installed software and rough data about the usage of the software (regularly or not at all). These data are send to a central server which collects the data and provides them to everyone interested. All you have to do to participate is to install the popularity-contest package.
And the results speak for themselves: over 50k people participated in the contest and submitted usage information.
Of course, not everyone installed the package, and these who did are most likely more technique affine than the people who didn’t – but the data are still interesting. For example you can check how well your package is adopted and used – or not. And you can gather information if the a specific package you introduced is really used: webmin is installed on over 2k machines – but only used by a couple of hundreds regularly. Instead, clamav is installed on almost 5k machines and is regularly used on almost 3k machines.
Also, you can check for the general popularity of packages: totem is much more popular than Amarok (more Gnome users, I guess), but xine is much more popular than mplayer. And so on…
I must admit that I would love to have such information for Fedora because I would also like to see hints for the adoption of the packages I maintain. But at the moment it is highly unlikely that we will see such information
What Popcon is for used software is Smolt for the hardware: Smolt collects hardware information from every client participating. With Fedora 7 every user has to decide if s/he wants to take part in the data collection or not, which might increase the number of participants drastically. At the moment the database lists roughly 11k entries.
With these data at your hand you can easily check which kind of hardware is used – and where you put your focus on improved hardware integration and support if you want to please the Linux user base. Also, it might give some hints about what kind of hardware support you can expect.
For example: one third of all machines have 512 MB Ram or less – therefore the distributor should be easy on the Ram. On the other side, on average more than one third of all machines have two or more CPUs/cores. Also, almost half of the installations are marked as Desktop and 20% as Laptop (and 20% as unknown).
But you can also check for the hardware used in one category: the Fedora people tend to use ATI hardware more than NVIDIA hardware.
Besides these hardware information Smolt also gathers basic information about the main system, like the default language, the version of the distribution and so on. This can be pretty important for distributors to have a picture how many people are still using old versions of a distribution and what it will mean when they are forced to upgrade, for example.
I really hope that it will become normal that all distributions collect information of both types. I would love to see a corporation, but as usual this is unlikely in the short time.
In the meantime, every distribution tries its own way. Ubuntu for example has the Ubuntu Hardware Database – which is seriously broken for month now which is pretty disappointing. But I’m sure they will fix it eventually.
In any case, speaking about statistics, you have to be careful and doubtful every time: since Popcon and Smolt both rely on volunteers you wont have a representative profile of the user base.
Also, gathering numbers might be tricky – for example, Debian’s popcon package was installed by less machines than machines which sent information, and the network device with the largest share among the Smolt users is used by 157.7% of all users…