A Can of Worms: Software volume controls (v2)

Andrew McGuinness https://github.com/andrewmcguinness

January 8, 2022

1 Problem

Here’s how I started my descent into this can of worms:

Occasionally I like to blast some music out of my PC. I have a pair of ancient stereo speakers, which in the old days I used to connect direct to my sound card. Over the years since the 8-bit SoundBlaster I originally used the speakers with, the output level produced by sound cards has gone down and down.

This year or last, I solved the problem by putting a dodgy old unused car stereo in between my sound card and my speakers. With that I was really able to blast Iron Maiden out of my office.

The problem was that while blasting Iron Maiden or Motorhead is an unqualified benefit to humanity, blasting Eric Rosen or Lock Picking Lawyer is not really so satisfying. The volume control on my old car stereo is not all that accessible behind my computer, so I need to adjust volume in YouTube.

YouTube has a volume slider. And that’s where the can of worms really begins.

My desktop runs Debian, and my choice of web browser is Firefox. Sound on Linux has been the subject of jokes for a very long time, but these days it generally works out of the box. But if it worked perfectly out of the box, I wouldn’t be writing this article.

The volume control slider in YouTube works. If you slide it all the way to the right, it is loud. If you slide it left, it gets quieter, and if you go all the way to the left, it becomes silent.

It isn’t really usable though. If you slide it nearly all the way to the left, it gets quieter, but not much quieter. Getting it from Iron Maiden levels to Eric Rosen levels is actually impossible.

2 The Audio Stack


Table 1: My PC audio stack

YouTube
Firefox
PulseAudio
ALSA
Intel HD Audio


The mixing of different sound sources at the appropriate relative volumes can actually be done by various levels of this stack, but since the Intel HD Audio is capable of it, the solution in use in my stack is to let the hardware do it.

So none of the higher levels of the stack are actually using the volume level I set with my mouse; they all just pass it on until it gets to the sound hardware.

Youtube has some arrangement of HTML that makes a slider, that goes from 0px to 40px in position, and sets the “volume” property of the HTML video element from 0.0 to 1.0 in direct linear proportion.

Other video players do effectively the same.

Firefox uses a mozilla cross-platform audio library called “cubeb”. A video HTML element gets a stream created in cubeb, and changing the volume of the video element calls “set_volume” on the stream object in cubeb, with the same value from 0.0 to 1.0.

cubeb when working with PulseAudio has two modes; PulseAudio can use “flat volumes”, where there is only one volume control, or the alternative where there is a pulseaudio volume control for each stream. My system is not using flat volumes; each stream has its own independent volume control.

So cubeb in the set_volume function passes the volume on to PulseAudio, which presumably passes it further on through ALSA into the Intel audio hardware.

I don’t need to dig around in kernel driver code, because my problem has already happened by this point.

PulseAudio is managing the multiple streams that have been set up, and I can see and control them with the pulseaudio volume control—a GUI application called “pavucontrol”.

I can see a section for the video player I am listening to, and there is a volume slider in pavucontrol corresponding to it. It also tells me the volume setting, as a percentage, and also in decibels.

If I set the slider in YouTube to about half-way, I can see the following:

temp0.volume = 0.45

In pavucontrol, the stream shows as 77% volume, and it also tells me that is -6.94 dB

If I want the video very quiet, I can do it with the pavucontrol slider. If I slide it to 10% (-60.29 dB), it indeed comes out very quiet.

However, my mouse is not sensitive enough for me to be able to get the volume in PulseAudio down to 10% by moving the YouTube slider. If I try to go that low, it just goes to zero.

I want the YouTube slider to act like the pavucontrol slider.

3 What is happening

What is happening is that cubeb is not converting the 0.0–1.0 volume linearly to the volume level handled by PulseAudio, which goes from 0 to 65536. Instead, it is calling a PulseAudio function “sw_volume_from_linear”, which is what is increasing the volume above that level. I downloaded the Firefox source code, modified the rust code in the cubeb pulse backend to just multiply the float by 65536 rather than doing the weird stuff, and my Firefox now works just the way I want it—I can adjust the volume in YouTube with no issues.

I’ve submitted my changes to mozilla, for what it is worth.

https://github.com/mozilla/cubeb-pulse-rs/issues/74

If this were just a bug in that code, this would be boring.

4 Why is it happening

Pulseaudio describe the volume control functions on their website at https://freedesktop.org/software/pulseaudio/doxygen/volume.html.

The volumes in PulseAudio are cubic in nature and applications shouldn’t perform calculations with them directly. Instead, they should be converted to and from either dB or a linear scale:

Cubeb is converting to a volume “from a linear scale”—and that’s why it doesn’t work well.

What is a “linear scale”?

“Linear” in this case refers to the actual amplitude of the signal sent to the speakers, which are also the PCM sample values handled by the streaming part of the software.

Multiplying the volume using a linear scale is the same as multiplying the amplitude of the samples. As the PulseAudio document says, if you are doing calculations on volume, that or decibels are probably what you should use.

Before we got sound hardware with mixers built in, this is what software tended to do. Indeed, if we are using the “flat volume” in PulseAudio, so that cubeb isn’t able to change the volume in hardware, it does exactly that: it multiplies each sample by its volume value. If you are using the old linux OSS sound drivers instead of pulseaudio, it does the same thing. I looked at the chromium source, and it in fact does not pass volume changes on to pulseaudio, it also just multiplies the samples by the volume. So my problem is not a new problem, it is not specific to Firefox, or to PulseAudio.

Why is there a problem? And why does pavucontrol do something else?

5 Ears

The reason is that our sense organs are much more sensitive and flexible than the software. According to some random web page, the difference in sound level between a whisper and a pneumatic drill is 70dB, which is a ratio of amplitude of several thousand.

That means to turn a sound down from Iron Maiden level to Eric Rosen level I probably do want to set my linear volume to 0.1%. And my problem is that with a 40-pixel wide slider that isn’t anywhere near possible.

It looks like in general, web browsers just haven’t taken account of this problem. They seem to set volume linearly.

Old-school analogue electronics always understood the issue. Hence the tapered potentiometer long used for volume controls.

PulseAudio, according to the same document quoted above, set their volume levels to be cubic—i.e. the actual linear volume they use is the cube of the integer volume level that pavucontrol sets from the slider, and that my modified cubeb code sets from Firefox. Therefore to set the linear volume to 0.1%, I just need to turn the volume slider down to 10%

By calling sw_volume_from_linear, the stock cubeb code is undoing that cubing, and setting the pulseaudio volume to the cube root of the input value—so that if you set the slider to 12.5%, pulseaudio will set its volume to 50%, with the result that the linear amplification is 12.5%, exactly as with the old code that just multiplied sample values.

The exception seems to be Windows. The WASAPI backend for cubeb, like the pulseaudio one, passes the volume level on to the system, using the IAudioStream::SetAllVolumes method.

However, unlike the PulseAudio backend, it does not modify the volume before passing it on.

Does Windows apply tapering to volume inputs? Not only does it, it actually has documentation describing what it does and why:

The IAudioEndpointVolume interface manages volume controls that are audio tapered. These controls are well suited to Windows applications that display volume sliders. For a volume slider that is tied to an audio-tapered volume control, each change in the position of the slider produces a change in perceived loudness that is proportional to the distance traveled by the slider. For a particular travel distance, the amount by which the perceived loudness increases or decreases is approximately the same regardless of whether the slider movement occurs in the lower, upper, or middle portion of the slider’s range of movement. Perceived loudness varies approximately linearly with the logarithm of the audio signal power.

That’s from a whole page on Audio-Tapered Volume Controls with detailed descriptions and graphs.

6 Next Steps

My fix, then, while bringing Firefox with PulseAudio into line with Firefox on Windows, and with PulseAudio’s own volume control, is actually making it behave differently from Firefox with PulseAudio and Flat Volumes, and from the older sound backends for Linux, and from Chromium on Linux.

I think all the bits of code that apply volume changes to streams in software, by multiplying the samples, should also use a non-linear function of the volume level to multiply the function by. I may make patches for that later.

7 The Real Weirdness

As I said above, this has been bothering me since I hooked an actual amplifier up to my PC. It was only this week that I got frustrated enough first to raise a bug against Firefox, and then to investigate and find out all of this and write a patch.

When I asked some other people, “Do you use YouTube on Firefox on Linux? What is the volume control like?”, they all said it was really bad, practically unusable.

It has always been this way. Nobody has ever complained. I have found no bug reports anywhere about the inappropriate use of linear volume controls.

8 Other Considerations

These are things that I don’t think are important, but if I don’t put them here then people might think I hadn’t thought of them.

Both the existing cubeb PulseAudio code and my modified version have a maximum which is the “PA_VOLUME_NORM” value, or 100%. PulseAudio can actually set the volume of a stream above 100%. I think it’s probably good that you can’t do that from within Firefox, and, while that is perhaps debatable, it’s really a separate question.

You could argue that it’s reasonable for the sound stack below a web page to treat volume as linear, and that it should be up to YouTube to manage the non-linearity of a volume slider, since that’s a user interface issue.

Unfortunately the HTML spec carefully doesn’t specify:

Return volume, interpreted relative to the range 0.0 to 1.0, with 0.0 being silent, and 1.0 being the loudest setting, values in between increasing in loudness. The range need not be linear.

In any case, assuming I’m not mistaken about the behaviour on Windows, the idea of standardising on HTML volume being linear fails on that point—for the majority of Firefox users on Windows, volume setting is already done “correctly” and it makes sense to bring Linux into line.

I also observe that other web video players, such as those based on Video.js, seem to work exactly the same way.

Are there other uses of HTML5 video that might need to assume that volume settings are linear? Again, as far as I can see they would already not work correctly on Firefox on Windows.

Exactly what function should be used is again not very important. Scientifically, the position of a volume control should ideally represent the logarithm of the amplitude, so basically it should be a linear scale of decibels, with the maximum being 0 and some multiplier down to maybe –60 or –70 at the minimum, which means the linear volume should be something like e8(x-1) However, it’s also necessary to be able to mute, so it would need special-case handling for zero to be silent (like old analogue volume controls that click to off at the full anticlockwise).

In practice, the electronics article I linked above says that controls are usually just two linear sections; more sensitive over the lower half of the travel, and less sensitive over the higher half. This is treated as an “approximation” of the exponential.

I think the PulseAudio version of just using x3 is probably as good as anything—simple to calculate, simple to explain, and goes down to zero.