Eight way Barcelona: can HP make it work?
At least, one place with Advantage: AMD
RIGHT NOW, as we all know, the CPU landscape looks pretty bleak for AMD if comparing the current products: Core 2, whether 65 nm or 45 nm, wipes the floor with the corresponding AMD offerings, whether in mobile, desktop of workstation & small server areas.
Oh yeah, I didn't mention the big MP boxen with four or more CPU sockets: that's where Intel either doesn't have an offering, or the existing Caneland platform isn't exactly tuned for maximum memory and interprocessor performance compared to Opteron's multiple HyperTransport channels and dedicated dual DDR2 per CPU.
In fact, the AMD HT2 platform, with three links per Opteron 8xxx series processor, has sufficient scalability to implement an eight-socket box, although it does get saturated at that point with multiple hops needed between "far" CPUs, accentuating the NUMA remote memory access penalty.
If, as originally expected, AMD went ahead to enable all four HT links in the Barcelona at full HT3 speed - yes, requiring the new socket early - we could have some really naughty eight-way boxen with incredible scalability now, leaving Intel's Caneland platform in the dust on many things despite the anaemic 2+ GHz AMD CPUs.
For whatever reasons, we are still stuck in the "compatibility mode" here. Good news is that, besides Sun Micro and a small gang of Taiwan vendors, another Tier 1 vendor is now offering an 8-way, 32-core, Barcelona box; the very top Tier 1, HP.
Take a look at the DL785 - it is a big 7U rack machine (the prefix 7 is for the height in U, for instance the 4-socket DL585 is 5U or 8.5+ inches high). The modular board system, not unlike the one pioneered in the old AlphaServer ES40 a decade ago, manages to squeeze all the CPUs and a LOT of memory in there: each processor has eight DDR2 DIMM sockets, each with up to an 8GB DIMM in there. Multiply 8 x 8 x 8 and you get a 512 GB maximum memory capacity, so large it can even hold many animal and plant complete genomes right in memory - the only machine in this class to pack that much RAM.
Add to that up to 16 SAS drives, 11 PCI-e slots (3 of them full 16x and another three 8x, so you could do a CrossFireX 3-D setup here, too), redundate PSUs and fans, and remote diagnostics.
The price to pay for this extra capacity? Well, it's much larger than Sun's older 8-way Opteron unit, the Andy Bertolsheim designed "Galaxy" masterpiece in just 4U height.
Of course, if the HT link scarcity is not a problem, you could use this thing to make larger supercomputer clusters with less nodes, i.e. simpler interconnects, to reach a specific TFLOPs performance target.
Nice machine overall, but what limits it? Besides, of course, the 2.3 GHz Barcelona 8356 Opterons' clock speed, which only a "Shanghai" CPU upgrade would address at year end? And, why only now, and not two years ago with Opterons at the performance leadership peak?
I had a quick chat with the HP people in charge here in the Far East on the above, and why they only recently decided to get the big box out. The answer I got was that the box is a performance and feature leader in its class, many commercial and virtualisation consolidation clients want it now - no mention of HPC here, where those extra inter-CPU hops can affect performance negatively.
Faced with the "Beckton" 8-core MP Nehalems some five quarters from now, HP will still give a level playing field to both Intel and AMD in this high end space. Of course, it does assume smooth "Shanghai", "Montreal", "Istanbul" and other major city delivery by AMD in the meantime.
They also believe that, even with minor hitches - and those current AMD hitches weren't exactly small - most AMD-based clients will still prefer to stay in that same AMD comfort zone. Of course, those AMD customers who needed leadership performance for their use would more likely be in a serious " discomfort zone" by now.
My point is simple: this HP box shows that AMD still has some five quarters of large 4-to-8 socket box performance and memory capacity leadership in certain areas (HPC FP in those cases where too few HT links don't cause slowdowns, as well as some memory-bound commercial apps), but that is most likely to be over, full stop, once Becktons come out.
I really want to believe that AMD will use this time to speed-up delivery of a 45 nm "Shanghai" product and maximise its presence in this small but profitable segment. µ

Comments
This is what AMD must focus on
AMD must focus on gaining ground in places where Intel isn't.AMD made a critical mistake when it thought that it could fight Intel head on in the mainstream marketplace. They don't have the resources for that and almost got crushed because of that boneheaded error. A mouse, however confident, should not challenge an elephant. Even a weakened one. Ever.
AMD must refashion themselves as a creator of specialty CPU products like this one.
585 = 4U
A DL585 is 4U in height. Likewise A DL385 is 2U. The DL785 is however 7U.(I've worked with hundreds of these boxes. Looking up manufacturer pages on HP's site is left an exercise for the reader.)
Nifty
I had one of the 8-way Sun machines mentioned in the story for a couple months' demo. It was very very sweet. We found for rather parallel simulation algorithm code written in C that judicious use of CPU affinity calls and management of memory access patterns got us quite close to linear 1:1 scaling up to 16 threads.I'd like to try one of these monsters, but HP doesn't let me configure and buy online ;-O
Mice and Elephants
I don't like that analogy. If we are stuck with a monolith then you can kiss your arsenal get me some low cost manycore processing on my desktop now. Recall that to have innovation costs money, monopolies will only tweak until they are forced to innovate.I view the whole GPU vs CPU as long lesson on that road. If Intel would not be faced by some upstarts that were trying to provide for a market that Intel did not think profitable contra the investment and R&D costs to get the GPU where it is at, we would still all be looking at monchrome green, 8 bit culture. Retro for sure would have been the elephants moto. Hopefully the smaller mice can realize that the elephant is indeed afraid of them and start innovating with intensity
Clarification needed
"....why they only recently decided to get the big box out. The answer I got was that the box is a performance and feature leader in its class, many commercial and virtualisation consolidation clients want it now..."Can someone please explain to me how this is the answer for the question?
cool...
Which that three way pci-e 16x and 8 processors, it might run vista ultimate and crysis at a decent speed... Now where's that spare generator gone... On a side note, I though HP where pioneering smaller blades, increased efficiency and clustering. Is this just a case of we built it because we could?Let see how many FRAPS it makes in WoWBFCOD10
I was waiting for some flame about how many jiggahertz the next Intel Spore3OMFG was going to run at and how it is soon to take over the world. Maybe it's just American Intel fanboys that think Intel dominates every sector of the market and AMD will bankrupt in 6 short months. But Intel doesn't, I make the purchase decision for the hardware in my company, and I just built 4 Opteron based boxes because of their price and performance, AMD has a cheaper overall platform than intel, plus a much better upgrade path. It's good to see them delivering a product that serves a market where Intel's brute force approach isn't effective.@hoohoo
you cant talk to a sales rep on the phone when buying a server that cost more than most nice cars ?i think some one is telling tall tells of big iron ownership
max memory
wow !! after exiting the 8-way market and trash talking Sun & IBM that there is no market for 8-way x86, now HP is compelled to re-introduce 8-way, a full 2 1/2 years after Sun. So what advantages does this have over sun x4600, other than the extra IO slots and extra local disks ? the Sun 8-way has the exact same memory capacity @64 dimm slots for 3U less rack space.Out of the Box
Wasn't it that HP had this thing built way back for when the Phenom was first to be released, but have held back until the errata was fixed.What chipset is this thing using, nVidia?
it sounds as though it uses a dual socket design (the biggest phenom chipset so far) multiple times on one PCB. Why else would a board have 11 PCI-e slots?
I think AMD should be doing their own high-end chipsets and not leaving it to their competitors.
Look how VIA managed to grow just by doing chipsets.
AMD may also find that they can tweak the CPU's further.
Iwill and Tyan both made 8-socket Opterons, the later (S4881/B4881) is still available and has a nVidia chipset
[most of the multi-socket AMD solutions at Tyan use nVidia chipsets]
So it would seem that for AMD to succeed in the multi-socket platform they need nVidia's help.
Until the reviews are in on this box, I don't think we should pigeon hole it with other phenom reviews, it may just be a great rendering station, or do things we have no tests for. I don't think everything is just about speed, it's about multi-tasking