[Beowulf] Win64 Clusters!!!!!!!!!!!!
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Apr 10 07:51:17 PDT 2007
- Previous message: [Beowulf] Win64 Clusters!!!!!!!!!!!!
- Next message: [Beowulf] Win64 Clusters!!!!!!!!!!!!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 8 Apr 2007, Joe Landman wrote: >> 64-bit computing solves a real problem. For apps that >> don't need the extra address space, the benefits of >> the additional registers in x86-64 are nearly undone >> by the need to move more bits around, so 32-bit >> and 64-bit modes are pretty much a push. When you > > I would love to see your data for this. Please note that I have quite a > bit of data that contradicts this assertion (e.g. directly measured > performance data, wall clock specifically of identical programs > compiled to run in 32 and 64 bit mode on the same physical machine, > running identical input decks). This is older data, from 2004. c.f. > http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_SI_rlsdWP1.0_.pdf > but it is still relevant, and specifically, directly addresses the > assertions. > >> add the additional difficulty of getting 64-bit drivers >> and what-not, I don't think it's worth messing with 64-bit >> computing for apps that don't need the address space. ... >> One additional way 64-bit computing is being oversold >> is that there aren't now, and maybe never will be, any >> human written program that requires more than 32 bits >> for the instruction segment of the program. It's simply > > This is a bold assertion. Sort of like the "no program will ever use > more than 640k of memory" made by a computing luminary many moons ago. > >> too complex for a human, or a group of humans, to write >> this much code. Again, note that this says nothing I totally agree with Joe on this issue. The "ideal" computer would have an infinite, flat address space, totally transparent to the user. Want to address memory location FF 0A BB 79 C3 12 93 54 6A 19 1D DA? (or simply have 2^90 \approx 10^27 data objects to manage)? The memory should be there, flat, transparent. Further, the "ideal" computer has a discretized binary representation of floating point numbers that is as close as possible to the real numbers they approximate for a variety of excellent numerical reasons. I remember reading any number of places how single precision floating point numbers were perfectly adequate for doing any sort of meaningful computation. I remember learning the hard way just how wrong this assertion is -- how much using double precision improves a long-running numerical computation both by slowing the rate of accumulation of the inevitable round-off errors and by admitting much larger exponents without having to manage them "by hand". I remember the joy of discovering IEEE 80 bit arithmetic in the venerable 8087, with more precision even than double. I remember how much FASTER native 80 bit arithmetic and then truncating to doubles is compared to doing double precision using library routines on top of an 8-bit or 16-bit or even 32-bit CPU. >> about the data segment of a program. Also, people tell >> me that there are programs that were generated by other >> programs that are larger than 32 bits. I've never seen >> one, but maybe they exist, and that's what I'm talking >> about human written programs. I don't understand how you could possibly imagine this to be true. I do numerical spin simulations on lattices in D dimensions. An N-dimensional spin (where N is not necessarily equal to D) is typically represented by 1-(N-1) real numbers (e.g. spherical polar angles). In addition any give spin may have other internal coordinates. To represent a spin therefore requires minimally order 4*N bytes for an ordinary 32-bit float representation of the spin coordinates, more likely 8*N bytes if one sensibly uses double precision coordinates. For 3D spins say 24 bytes per site. One then wishes to do simulations on the largest lattices possible. The constraint on lattice size is generally a mix of how much memory can hold and CPU speed, noting well that for cubic lattices the number of sites scales like L^D where L is the cube length in units of cartesian-indexed "sites". A 32 bit machine can address at most 4 GB of memory; in general purpose OS implementations this is generally reduced by the requirements of running the OS itself and a VM system to 3 GB (at least in a single data structure, without swapping). Well, if I put my 24 byte spins on a 1000x1000x1000 lattice I'm already up to 24 GB of memory. If I'm working on D=4 spaces or D=5 spaces, then a mere 100x100x100x100x100 lattice is 24x10^10 or 240 GB in size. Here the speed of doing arithmetic in 64 bits native AND the larger address space of 64 bit machines are absolutely essential to even play the game. This isn't an isolated (if specific) example. There is a vast range of memory-size bound problems, some of which have modest CPU requirements but an absolute necessity to be able to efficiently address large memory spaces. So much so that there have been cluster computing development efforts that focus on building very large flat memory models at the expense of computing speed -- the Trapeze project at Duke, for example. Here the point isn't do do lots of computation in parallel -- the application may even be single threaded. The parallel computer exists solely to provide the illusion of a vast reasonably flat memory space. There are other groups in the physics department here who would routinely buy 16+ GB machines (which obviously require 64 bit OS and hardware) if only they could afford all that memory as their computations easily scale out that far. They generally can afford only one or two "large memory" machines (which are still much more expensive than 2-4 GB machines as the price premium on really large memory sticks persists) but they'd LOVE to go large. Personally I "wish" that they'd done the dual core thing entirely differently. Instead of having two completely independent 64 bit cores per CPU, they might have built a 128-bit core with a hardware floating point execution pathway that permitted it to be transparently broken down into 4 32 bit parallel pathways, 2 64 bit pathways, 1 96 bit and 1 32 bit pathway, or 1 128 bit pathway, with entirely transparent flat memory access out to 128 bits, and with hardware implementation of 128 bit integer or 128 bit floating point arithmetic (on down). Leave it to a mix of the CPU, the OS, the compiler, and the application to decide how to pipeline and allocate the available ALUs, registers, cache lines, etc. to the needs of the program. But I'm not terribly worried. This to some extent describes the cell architecture, with some slop as to just where the ganging together of smaller logic units into larger ones occurs. And lots of very smart people are working on this -- smarter than me for sure -- and doubtless have far better ideas. Stating that there is no need for 64 bit architectures and that 32 bits is enough for anyone is basically equivalent to stating "the systems engineers working for AMD and Intel and IBM and Motorola are complete idiots". This is simply not the case. they aren't idiots, they are brilliant, and the simple fact of the matter is that 64 bit systems are faster, smarter, bigger, better than 32 bit systems. When AMD's opteron was first released, it was noted that it was the fastest >>32 bit<< architecture available at the time, because it was in aggregate faster to do 32 bit arithmetic (especially where a lot of that 32 bit arithmetic de facto involved 64 bit floats) on a 64 bit machine than it was on a 32 bit machine. I have watched from the days of the Z80 and 8088 (8 bit internal, 16 bit segmented address space) through the 8086 (16 bit internal, 16 bit segmented address space) through the 186 (very short except as a programmable device CPU), 286, 386, 486 (including the crippled SX), pentium, pentium pro, etc... with similar progress by AMD and nearly forgotten Intel competitors (Cyrix?) and completely different progression by Motorola with its FLAT 68000 memory space right on up to the Opteron, the 64-bit Xeon, the Athlon 64. With sundry side trips into Sparc, MIPS, and other workstation CPU architectures on the side, BTW. The process has been from the beginning been driven by a voracious public eager to take advantage of bigger address spaces, faster arithmetic and so on associated with larger data pathways. I fully expect to see 128 bit CPUs become a standard in the next decade, unless the cell approach does indeed represent a paradigm shift away from the notion of a "central" processing unit at all and we see instead on-the-fly reconfigurable multiprocessing units that can gang together to 128 bits (or even more) if that's what you need or can equally well function as a cluster of N 32 bit "thread execution units", where the OS kernel becomes basically a cluster operating system with a dynamic "cluster" or processing and memory resources interconnected by what amounts to a network. rgb > > I am sorry, but I think this may be an artificial strawman. > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Win64 Clusters!!!!!!!!!!!!
- Next message: [Beowulf] Win64 Clusters!!!!!!!!!!!!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
