Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Benchmark between Dell Poweredge 1950 And 1435

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Thu Mar 8 10:26:30 PST 2007


> Great thanks. That was clear and the takeaway is that I should pay attention
> to the number of memory channels per core (which may be less than 1.0)

I think the takeaway is a bit more acute: if your code is cache-friendly,
simply pay attention to cores * clock * flops/cycle.

otherwise (ie, when your models are large), pay attention to the "balance"
between observed memory bandwidth and peak flops.

the stream benchmark is a great way to do this, and has traditionally
promulgated the "balance" argument.  here's an example:

http://www.cs.virginia.edu/stream/stream_mail/2007/0001.html

basically, 13 GB/s for a 2x2 opteron/2.8 system (peak flops would 
be 2*2*2*2.8=22.4, so you need 1.7 flops per byte to be happy.

I don't have a report handy for core2, but iirc, people report hitting
a wall of around 9 GB/s for any dual-FSB core2 system.  assuming dual-core
parts like the paper, peak theoretical flops is 37 GFlops, for a balance
of just over 4.  that ratio should really be called "imbalance" ;)
quad-core would be worse, of course.



More information about the Beowulf mailing list