It's not a mathematical mistake and nothing to do with windowing.
That peak in the bass response might be the speaker, it might be the room. Looking at the actual IR it appears to be the speaker as there are no discernible early reflections.
The frequency resolution of an IR is the sample rate divided by the number of samples in the IR. The window function has nothing to do with frequency resolution (except for making it even less). So a 1K IR at 48 kHz sample rate has a frequency resolution of roughly 48 Hz. If a speaker has a resonance (formant) at, say 80 Hz with a Q of, say, 3.0, then 48 Hz is insufficient to capture that resonance accurately. You need a frequency resolution of several Hz to accurately recreate that resonance. I chose 80 Hz and a Q of 3 because that's what that response looks like. The Q could even be higher than that.
It doesn't take much mental energy to realize that if you have a narrow formant at a low frequency then you need fine frequency resolution to reproduce that. An 80 Hz formant with a Q of 3 only spans about 25 Hz. Obviously a frequency resolution of 48 Hz is not going to be able to reproduce that.
Windowing only smooths the response even more. This is basic FFT theory. The less time-domain information you have, the less frequency domain information you have and vice-versa. This is the uncertainty principle. I always window IRs with a Hann window.
EDIT: I broke out my impedance measurements for that Vox cabinet and the speaker resonance is 80 Hz.