Processing is usually (much) more than the conversion. Typical converters have around 12 samples of latency (at single-speed conversion). That's 24 samples total (in + out).
Most processors use frame (also called block) processing. It's more efficient to process a block of samples at a time due to function overhead. Typical frame sizes in hardware processors are 32 or 64 samples (or more). To understand frame processing consider a simple first-order IIR filter. In C-code the processing would be:
Code:
void simpleIirFilter(FILTER *p, float* in, float* out, int N)
{
int i;
float a, b; // coefficients
float s; // state
a = p->a;
b = p-> b;
s = p-> s;
for(i = 0; i < N; i++)
{
out[i] = a * s + b * in[i];
s = out[i];
}
p->s = s;
return;
}
In this simple example the processing is two multiplies and an add plus a save. In a modern DSP that could be performed in just a few clock cycles (SIMD, zero-overhead loops, etc.). The function overhead, however, is three data fetches for the function preamble, a save for the postamble plus the function call itself. The function call has to allocate the stack, push stuff onto the stack, pass the arguments and then when the function is done pop the stack and return. Depending upon the number of registers saved this can be dozens of clock cycles.
If you're processing just one sample at a time you've spent far more time in the function overhead than in the actual processing routine. Assume it's, say, 12 clock cycles for function overhead. That means your function overhead is 80% (!!!). Your spending far more time in function overhead than actually processing data. Now, if you process, say, 32 samples your function overhead is still 12 cycles but your processing is 96 cycles (108 cycles total). The function overhead is now only 11%.
The latency due to the frame size is twice the frame size since you have to buffer a frame, process it and then write the results to the output. So a frame size of 32 would incur 64 samples of latency. Add the converter latency and you have 88 samples (= 1.83ms @ 48kHz).
Then you get into oversampling. Line6 uses minimum-phase oversampling/decimation which reduces latency but causes phase distortion. The Axe-Fx III uses linear-phase by default but you can use minimum-phase by setting the Oversampling Mode to Min Latency. Oversampling is only needed for the nonlinear stuff (amp and drive blocks).
I'm skeptical of any result much less than 2ms. It is extremely inefficient to do single sample processing as the function overhead often ends up being more cycles than the function itself. A dedicated pedal like the Iridium I can possibly believe it but for any multi-effect I highly doubt that is being done. You have to be very careful when measuring latency. Sometimes the latency can be so great that it appears to be much shorter than it is because what you're seeing on the o'scope is actually the
previous response so the actual latency is the measured latency plus the time base.