Latency of amp modelers

Physical distance related latency between listening position and sound source is often cited in discussions about latency perception, sometimes to suggest less than commonly understood impact on feel, but its not often highlighted that distance between listening position and sound source, as well as other factors impacting latency, needs to be added on to total rig latency - ie: I buy my Mooer thinking 9ms will be good enough but I sit 5 feet from my cab (+5ms) and run my fav pedal in the Mooer's loop (+9ms+) > now I'm at an intollerable 23ms.
Wasn't an attempt to do anything other than give a reference or scenario most could understand, test or relate to in order to demonstrate a 3ms delay. In other words, imperceptible.
 
Mostly A/D and D/A conversion. The processing inside the modeler usually contributes less to the latency than than the conversion. For example, the latency for a simple Axe-FX preset is usually more than 50% due to the conversion.
Processing is usually (much) more than the conversion. Typical converters have around 12 samples of latency (at single-speed conversion). That's 24 samples total (in + out).

Most processors use frame (also called block) processing. It's more efficient to process a block of samples at a time due to function overhead. Typical frame sizes in hardware processors are 32 or 64 samples (or more). To understand frame processing consider a simple first-order IIR filter. In C-code the processing would be:
Code:
void simpleIirFilter(FILTER *p, float* in, float* out, int N)
{
    int i;
    float a, b;    // coefficients
    float s;       // state

    a = p->a;
    b = p-> b;
    s = p-> s;


    for(i = 0; i < N; i++)
    {
        out[i] = a * s + b * in[i];
        s = out[i];
    }

    p->s = s;

    return;
}

In this simple example the processing is two multiplies and an add plus a save. In a modern DSP that could be performed in just a few clock cycles (SIMD, zero-overhead loops, etc.). The function overhead, however, is three data fetches for the function preamble, a save for the postamble plus the function call itself. The function call has to allocate the stack, push stuff onto the stack, pass the arguments and then when the function is done pop the stack and return. Depending upon the number of registers saved this can be dozens of clock cycles.

If you're processing just one sample at a time you've spent far more time in the function overhead than in the actual processing routine. Assume it's, say, 12 clock cycles for function overhead. That means your function overhead is 80% (!!!). Your spending far more time in function overhead than actually processing data. Now, if you process, say, 32 samples your function overhead is still 12 cycles but your processing is 96 cycles (108 cycles total). The function overhead is now only 11%.

The latency due to the frame size is twice the frame size since you have to buffer a frame, process it and then write the results to the output. So a frame size of 32 would incur 64 samples of latency. Add the converter latency and you have 88 samples (= 1.83ms @ 48kHz).

Then you get into oversampling. Line6 uses minimum-phase oversampling/decimation which reduces latency but causes phase distortion. The Axe-Fx III uses linear-phase by default but you can use minimum-phase by setting the Oversampling Mode to Min Latency. Oversampling is only needed for the nonlinear stuff (amp and drive blocks).

I'm skeptical of any result much less than 2ms. It is extremely inefficient to do single sample processing as the function overhead often ends up being more cycles than the function itself. A dedicated pedal like the Iridium I can possibly believe it but for any multi-effect I highly doubt that is being done. You have to be very careful when measuring latency. Sometimes the latency can be so great that it appears to be much shorter than it is because what you're seeing on the o'scope is actually the previous response so the actual latency is the measured latency plus the time base.
 
Processing is usually (much) more than the conversion. Typical converters have around 12 samples of latency (at single-speed conversion). That's 24 samples total (in + out).

Most processors use frame (also called block) processing. It's more efficient to process a block of samples at a time due to function overhead. Typical frame sizes in hardware processors are 32 or 64 samples (or more). To understand frame processing consider a simple first-order IIR filter. In C-code the processing would be:
Code:
void simpleIirFilter(FILTER *p, float* in, float* out, int N)
{
    int i;
    float a, b;    // coefficients
    float s;       // state

    a = p->a;
    b = p-> b;
    s = p-> s;


    for(i = 0; i < N; i++)
    {
        out[i] = a * s + b * in[i];
        s = out[i];
    }

    p->s = s;

    return;
}

In this simple example the processing is two multiplies and an add plus a save. In a modern DSP that could be performed in just a few clock cycles (SIMD, zero-overhead loops, etc.). The function overhead, however, is three data fetches for the function preamble, a save for the postamble plus the function call itself. The function call has to allocate the stack, push stuff onto the stack, pass the arguments and then when the function is done pop the stack and return. Depending upon the number of registers saved this can be dozens of clock cycles.

If you're processing just one sample at a time you've spent far more time in the function overhead than in the actual processing routine. Assume it's, say, 12 clock cycles for function overhead. That means your function overhead is 80% (!!!). Your spending far more time in function overhead than actually processing data. Now, if you process, say, 32 samples your function overhead is still 12 cycles but your processing is 96 cycles (108 cycles total). The function overhead is now only 11%.

The latency due to the frame size is twice the frame size since you have to buffer a frame, process it and then write the results to the output. So a frame size of 32 would incur 64 samples of latency. Add the converter latency and you have 88 samples (= 1.83ms @ 48kHz).

Then you get into oversampling. Line6 uses minimum-phase oversampling/decimation which reduces latency but causes phase distortion. The Axe-Fx III uses linear-phase by default but you can use minimum-phase by setting the Oversampling Mode to Min Latency. Oversampling is only needed for the nonlinear stuff (amp and drive blocks).

I'm skeptical of any result much less than 2ms. It is extremely inefficient to do single sample processing as the function overhead often ends up being more cycles than the function itself. A dedicated pedal like the Iridium I can possibly believe it but for any multi-effect I highly doubt that is being done. You have to be very careful when measuring latency. Sometimes the latency can be so great that it appears to be much shorter than it is because what you're seeing on the o'scope is actually the previous response so the actual latency is the measured latency plus the time base.
Wicked-smaht.jpg
 
Processing is usually (much) more than the conversion. Typical converters have around 12 samples of latency (at single-speed conversion). That's 24 samples total (in + out).
What is the minimum latency you measure with an empty preset?

Edit: Answering my own question, I measure 1.2ms for an empty preset. That goes up to 2.3 ms when I add amp and cab blocks to it. This is consistent with latency numbers I've seen other people post on the forum. (2.3-1.2)/2.3 = 48%. I'm sure there are other ways to measure it, but that's why I say the block processing accounts for less than half of the latency in a simple preset.

I'm skeptical of any result much less than 2ms. It is extremely inefficient to do single sample processing as the function overhead often ends up being more cycles than the function itself. A dedicated pedal like the Iridium I can possibly believe it but for any multi-effect I highly doubt that is being done. You have to be very careful when measuring latency. Sometimes the latency can be so great that it appears to be much shorter than it is because what you're seeing on the o'scope is actually the previous response so the actual latency is the measured latency plus the time base.

With a sufficiently high sample rate it's possible to get below 2ms while using a reasonable block size. I think Boss advertises a high sample rate? If so, that might account for their position on the chart. But, of course, the amount of work you can get done with your cpu goes down dramatically in that case.
 
Last edited:
What is the minimum latency you measure with an empty preset?

Edit: Answering my own question, I measure 1.2ms for an empty preset. That goes up to 2.3 ms when I add amp and cab blocks to it. This is consistent with latency numbers I've seen other people post on the forum. (2.3-1.2)/2.3 = 48%. I'm sure there are other ways to measure it, but that's why I say the block processing accounts for less than half of the latency in a simple preset.



With a sufficiently high sample rate it's possible to get below 2ms while using a reasonable block size. I think Boss advertises a high sample rate? If so, that might account for their position on the chart. But, of course, the amount of work you can get done with your cpu goes down dramatically in that case.
The data is processed in blocks even in an empty preset. That's how you do this sort of thing. You buffer up N samples and then pass that buffer to the processing routine.

The latency for an empty preset in the Axe-Fx III would be 1.2 ms. Of that, less than 0.5 ms is converter latency. When Oversampling Mode is set to Best Quality adding an amp block adds another 1.0 ms. If you set Oversampling Mode to Min Latency the added latency drops to less than 0.4 ms. Cab blocks do not add any latency.

The higher the sample rate the more processing power required. Anything over 48 kHz starts to become wasteful, especially when processing IRs. If you were to double the sample rate (96 kHz) you would quadruple the processing power required for IRs since you would need twice as many samples in your IR and processed at twice the speed. You could always downsample, process the IR at 48 kHz, and then upsample but then your latency is no better (and probably worse) than processing at 48 kHz (unless you use min-phase resampling which destroys phase information).

Everything is a compromise in real-time DSP. I've been doing this since DSPs first came on the market. Latency isn't everything. It's one factor among many. The good designer weighs everything and makes tradeoffs to achieve a balance. If you overemphasize latency then you waste processing power or incur aliasing or distort the phase response, etc., etc. Anything under 5ms is considered undetectable. We strive for less than 3ms in our products so that you can add devices in the loops and not exceed that 5ms threshold.
 
Yes, that's how I process IR's. You have the luxury of being able to assume a fixed sample rate. I don't :).

Latency isn't everything. It's one factor among many.

That's the bottom line here. People struggle to grasp the overall quality of complex products like this, so there is an inevitable desire to take a reductionist view and boil things down to simple numbers and checklists. The simple fact is: that will only lead to a flawed comparison.
 
That's the bottom line here. People struggle to grasp the overall quality of complex products like this, so there is an inevitable desire to take a reductionist view and boil things down to simple numbers and checklists. The simple fact is: that will only lead to a flawed comparison.
Yup, especially in places like TGP where ignorance reigns supreme.
 
This thread has become very informative even for those who, like me, know little about the subject and do not have the means (training, study) to understand 100% what has been written in the various posts.
 
I don't trust the results. I tested the QC and it was much higher than that.
When i was testing the QC there was a preset I made that the latency was like putting the buffer size in Logic to 1024 samples. I was surprised by this, like if there was no internal limit to the processing power.
 
FWIW here are the results of the various modelers I have:
AmpliFire: 1.9 ms*
Axe-Fx III (Min Latency): 1.6 ms*
Axe-Fx III (Best Quality): 2.2 ms
FM3: 3.3 ms
Helix: 2.0 ms*
Kemper: 5.0 ms
QC: 4.8 ms **
UAD Dream: 2.4 ms*

Latency testing was performed using a pulse into the DUT. Preset in DUT consisted of all possible locations in a single row populated and all but the amp block bypassed, i.e., In->Drive->Eq->Amp->Cab->Delay->Reverb->etc.->Out with all but Amp bypassed.

* Impulse response indicates devices uses minimum-phase interpolation/decimation which causes phase distortion.

** QC latency is dependent upon number of blocks in row and varies depending upon effects and amp model used. Value listed obtained from factory preset 1A. Some presets are lower, some are higher (i.e. preset 1C is 3.5 ms, preset 1E is 6.1 ms. Average latency is about 5.0 ms).
 
When i was testing the QC there was a preset I made that the latency was like putting the buffer size in Logic to 1024 samples. I was surprised by this, like if there was no internal limit to the processing power.
It appears that the frame size changes as you add blocks. If you use all four "lanes" and fill them up the latency could probably easily exceed 20 ms.
 
It appears that the frame size changes as you add blocks. If you use all four "lanes" and fill them up the latency could probably easily exceed 20 ms.
People are going to tell me I'm crazy and I am hearing things (or that I can't hear this)
The first time I plugged into a QC, I downloaded some 'artist preset' from their cloud, it had a decent amount of blocks in it, and I immediately thought the unit I had was broken because the latency was like nothing I'd ever experienced (when using a Fractal....or Kemper for that matter).
It could very much feel a disconnect between my hands and my ears.
 
People are going to tell me I'm crazy and I am hearing things (or that I can't hear this)
The first time I plugged into a QC, I downloaded some 'artist preset' from their cloud, it had a decent amount of blocks in it, and I immediately thought the unit I had was broken because the latency was like nothing I'd ever experienced (when using a Fractal....or Kemper for that matter).
It could very much feel a disconnect between my hands and my ears.
From what I read above, my guess is you were not imagining anything.
 
FWIW here are the results of the various modelers I have:
AmpliFire: 1.9 ms*
Axe-Fx III (Min Latency): 1.6 ms*
Axe-Fx III (Best Quality): 2.2 ms
FM3: 3.3 ms
Helix: 2.0 ms*
Kemper: 5.0 ms
QC: 4.8 ms **
UAD Dream: 2.4 ms*

Latency testing was performed using a pulse into the DUT. Preset in DUT consisted of all possible locations in a single row populated and all but the amp block bypassed, i.e., In->Drive->Eq->Amp->Cab->Delay->Reverb->etc.->Out with all but Amp bypassed.

* Impulse response indicates devices uses minimum-phase interpolation/decimation which causes phase distortion.

** QC latency is dependent upon number of blocks in row and varies depending upon effects and amp model used. Value listed obtained from factory preset 1A. Some presets are lower, some are higher (i.e. preset 1C is 3.5 ms, preset 1E is 6.1 ms. Average latency is about 5.0 ms).

Can you comment on the rationale behind filling up the row and bypassing all but the amp block? The OP video has a more naive approach of just adding the same pair of amp/cab blocks to each device. At first glance that would seem like a more even comparison, but I suppose your method exercises some factors that the naive method misses.
 
Can you comment on the rationale behind filling up the row and bypassing all but the amp block? The OP video has a more naive approach of just adding the same pair of amp/cab blocks to each device. At first glance that would seem like a more even comparison, but I suppose your method exercises some factors that the naive method misses.
Because manufacturers might "cheat" and reduce the frame size if there is only a single block in the row in order to make latency measurements appear better than they are in a real-world situation.
 
His test was somewhat questionable using that Logic built-in tool. I've used it and getting wildly variable responses with the same device input.
as mentioned above, he's comparing the timing of transient peak against a reference which afaik is accurate if one looks at the time scale carefully and doesn't mix up which peak is which, but I agree about Logic's loopback ping utility - it gets confused if the ping passes thru any tone changing along the way (doesn't recognize the return ping from sent) - ok if the ping stays unchanged round trip.
 
In a direct comparison between two sounds, most people can detect a one millisecond delay. Without a direct comparison, it is possible to notice a delay or latency of 10 milliseconds, detecting anything less than 5 milliseconds is getting into the realm of not humanly possible. 3 milliseconds of delay is approximately the difference between speakers 3 feet vs 10 feet away from a listening position.

Having testing 120 normal hearing individuals in the psychacoustics lab during grade school on things like minimum gap detection most subjects were more into the 20-30ms range with nothing under 10-12ms observed. I’d agree sub 5m/s is outside the range of human auditory perception.
 
Having testing 120 normal hearing individuals in the psychacoustics lab during grade school on things like minimum gap detection most subjects were more into the 20-30ms range with nothing under 10-12ms observed. I’d agree sub 5m/s is outside the range of human auditory perception.
one may feel latency that's not audible - the difference maybe between a lively feel, and a dead feel vs a gap one could actually hear in either case, and, as mentioned above, modellers that can't do 1/2 dozen ms at least, start becoming useless running any kind of outboard gear.
 
Last edited:
Back
Top Bottom