Diagonal Volterra kernels are pretty neat. Basically, straight up convolution takes your input sequence, say, x[n], and convolves it with some kernel, say h[m], this is the impulse response of the speaker that you load in the Axe. So your output is y = h*x where * is the convolution operator. Volterra kernels are basically a power series, you could think of it sort of like a Taylor expansion.
A diagonal Volterra expansion has a bunch of different kernels, h_1, h_2, h_3 and so on, up to some h_n. In the case of Nebula it's something like 9 kernels. Then you take your input x, and you also square it, cube it, and so on. And each different kernel corresponds to one power term. The math notation will make it look more simple.
y = h_1*x + h_2*x^2 + h_3*x^3 ... h_n*x^n
(n.b. in the general Volterra case, there are different kernels used, say k_n, and they are convolved with all the lower powers i.e. y = k_1*x + k_2*x*x^2 + k_3*x*x^2*x^3 ... and so on. this is more complicated)
So you can see there are n convolutions. The raising x to a power introduces nonlinearities, and the Volterra kernels "shape" those nonlinearities. Extracting these higher order impulse responses h_n is very simple. The downside is, we're performing n convolutions. Think of the cabinet block right now. It's performing 2 convolution operations since it's a stereo pair. For an nth order diagonal Volterra kernel, it would take 2*n operations in stereo or just n in mono. Put more simply, it would take n times the CPU it currently does, where n is something like 5 or 9. So the CPU couldn't handle it right now.
There are some efficient ways to do this in the frequency domain. First of all frequency domain convolution is way faster. And there's another trick that can result in another speedup but it's sort of technical so I won't bother, it's pretty simple though. But the problem is plain old frequency domain convolution, even though it's very fast, has latency. There is partitioned convolution which has some tradeoff for lower latency, but I don't know if that would be suitable for the Axe. I'm not sure whether the Axe does it in the time-domain or not, only Cliff knows that.
But basically the moral is, whatever the Axe-FX does for convolution right now, doing Volterra kernels would take 5-10 times as much CPU.