Intel's upcoming AVX instructions go back to the future
29 Apr 2008 | 10:40 BST
Comment The Alpha and Omega of it all
WHEN LOOKING at those Nehalem slides - and Opteron slides a few years before - those of us with longer memories could help recalling similar Alpha EV7 and EV8 stuff presented a decade earlier.
All the nice integrated multi-channel memory controllers, four memory-speed links to talk to other CPUs north, south, east and west, and immense scalability as a result. Ask those who happen to have used the mighty HP AlphaServer ES47 with such stuff - we had two of these in our old labs for a few months five years ago, and only now do other systems come matching their Streams numbers.
This past IDF, Intel announced its next major X86 instruction set facelift - the Advanced Vector Extensions, or AVX. What's the big deal, you may ask? After all, there were countless MMX and SSE rounds till this day.
Well, it is a nice 3-operand, RISC-like approach, so you can finally do A+B=C in a single opcode: AMD's proposed SSE5 is supposed to go along this line as well. Later, with fused multiply-adds, this could even become A*B+C=D. Then, you got a more efficient instruction format with a lot of baggage (and length) reduced - again, one of major problems of X86 on efficient fixed-opcode length RISCs. No need to mention the good ship Itanic here, it can have more instruction FORMATS than some RISCs have instructions - that's how 'elegant' it is.
Then, AVX doubles the SSE register length to 256 bits - doubling the amount of data fitting in and, matched with doubled data paths, providing twice the FP throughput per clock in the Sandy Bridge CPU some two years from now. And, one day maybe, you could fit two quad-precision 128-bit FP numbers into each of these registers. Marvellous!
But then, these innovations aren't that new? From the turn of the century, there was something called EV9 - a 2146 4 Alpha CPU somewhere in 2005. The thing was proposed to have one (possibly two) 8-way superscalar EV8 cores, each multithreaded of course. And a dedicated vector engine with 16 MB L3 cache. Now, that was to be an interesting beast for a general-purpose CPU: a 1024-bit wide monster, with matching L3 cache width, and 32 1024-bit wide vector registers (yeah, four kilobytes of numbers in there).
The thing would have achieved 16 parallel 64-bit DP FP mul-adds per clock then, and the humongous register space coupled with ultrawide cache and 16 RDRAM channels would have ensured quite a high practical performance rate, too, something on the order of 100++ DP GFLOPs per core.
Too bad we all know what happen to the Alpha and its original owner anyway, and that's not gonna change unless, say, Nvidia discovers that Alpha also had the world's best real-time X86 code translator for Windoze, the FX!32, making it the only remaining high-end candidate for the firm's CPU in the absence of AMD buyout. Is Nvidia willing to do a fabless EV9 feeder for their Geforces?
Back to today, you can see that many "brand new" things in the new and upcoming X86 processor from both sides do somehow trace their start to the Apha. And yeah, Intel AVX is a great news, and will go a long way towards gradually moving the application based towards a more elegant and efficient instruction set - not exactly a RISC yet, but a step in the right direction anyway. µ
© 2007 Incisive Media Investments Ltd. 2007