Often misunderstood, dithering seems to be always a subject for debates and disputes about sound quality. The truth lies somewhere in the middle and it really is all about conditions. Are we applying dither to mask noise, distortion or signal degradation? Are we working with HD files with great dynamic range? Are we making significant changes of bit resolution?
Instead of me getting into a complicated technical talk, let's discuss some examples:
Because 24 bit gives you 144 dB of dynamic range, when you do have real loud playback levels, noise is no longer an issue. For example, if you are relatively close to an airline's jet engine {one taking off} you won't be able to hear the footsteps of any person nearby, that's because our hearing threshold starts from 0 dB and goes up to the threshold of pain at around 130dB SPL. So, a more noticeable problem would be when you have really loud levels, where you get a relative higher background noise level because you're unavoidably amplifying that circuitry oscillation noise which some people sometimes call it thermal and some just call it "analog warmth".
Obviously, these are just my opinions, but if you read on, you may find some validity to my ideas. To me they are valid, but it's not like I'm an equipment designer or an expert in electronics. I'm just presenting some interesting observations and nothing more. It's up to you and make your own conclusions.
Here are some dithering tests which should be small in size so as to not be a problem downloading and the differences should be clear or noticeable enough to be reviewed even from a laptop computer.
These samples generate a tone signal in double precision floating point with roughly 300 dB of dynamic range. In addition, the mathematics were done at that precision until it was converted to samples.
One test is made of a 3 kHz test tone for 4 seconds which fades linearly from full scale down to zero. The "no_dithered file" is truncated to 7 bit and the "dithered" is, of course, dithered then truncated to 7 bit.
The second test is in essence the same with the exception that it sweeps all the frequencies linearly from 1 kHz to 5 kHz, and is as I said before, both dithered and un_dithered versions at a 7 bit depth.
Note that as the signal level slopes down, you can clearly hear in the non-dithered versions a change in oscillation {sounding very similar to aliasing} and also, notice that near the end of the tones, as the signal gets smaller and smaller, these sines waves sound closer to square waves than sine waves. In all dithered versions, the signal is buried in dither noise but the sine wave remains still sounding like a sine wave and that is from a 7 bit file which has a 42 dB signal-to-noise floor ratio.
So, the bottom line is that dither doesn't get rid of artifacts, but spreads them out more uniformly. So, instead of being highly correlated with the signal, they are spread out as background noise. This is obvious and you can clearly hear this on the sample tests where the dithered versions are nosier but preserving the qualities of the test tone.
Thus, higher sampling frequency dithering rearranges or moves the noise spectrum so that it is higher at the frequency bands that we are unable to perceive and lower where we can, or are able to.