Thursday, April 26, 2007

A $600 Super-computer

You can purchase a Sony Playstation 3 (PS3) for $600, install Linux on it, and get yourself a 200+ Gflop (single-precision) Linux supercomputer.

There is an IBM tutorial for installing Linux on PS3 and for building applications @
www-128.ibm.com/developerworks/power/library/pa-linuxps3-1

Under Linux on the PS3, there are six accessible synergistic processing elements (SPEs) for computation. (A seventh runs in a special mode and is dedicated to aspects of the OS and security, and an eighth is disabled to improve production yields.) Each SPE can run a different program, and the internal communications allows programmers to arrange the data flow in different ways using parallel, pipelined or streamed processing data flow models.

A DMA engine moves data on and off the cell. DMA requires the programmer to manually orchestrate data movement and computation. There will be the need for a lot of programming assembly code to get close to peak performance numbers.

One can run "C" on the SPEs but the performance will degrade; tight loops of "C" will generate around 4 Gflop/s per SPE.

PS3 offers limited memory, as each SPE has only 256-KB RAM for both program and data. Thus, only tight loops can run on each SPE. Performance is poor for double-precision (64-bit), relative to single-precision (32-bit) performance.

A PS3 cell processor can produce around 204 Gflop/s single-precision performance but only 15 Gflop/s double-precision

If we compare the PS3 to a 4-way (dual-socket, dual-core) 2.4 Ghz Opteron with 1 GB of DDR2-667 RAM, using 3 measures:

single-precision floating point performance
the ratio of RAM capacity and floating point performance (GB/GFLOP)
the ratio of RAM bandwidth and floating point performance (GB/s/ Gflop)

we get the follwoing figures:




Figure 1: Peak single-precision floating-point rate comparison.



















Figure 2: Assumed “good” "C" performance comparison.





One can see that the PS3 is highly unbalanced and favors single-precision floating-point performance. It is clear that the Cell B.E. architecture is highly specialized for certain types of applications but not others.

More information is available below:

Cell Processors for Scientific Computing @ ww.cs.berkeley.edu/~samw/projects/cell/CF06.pdf
Cell Workshop Slides, LANL @ www.cs.utk.edu/~dongarra/cell2006/cell-slides/04-Ken-Koch.pdf
Graph Exploration Algorithms @ hpc.pnl.gov/people/fabrizio/papers/ipdps07-graphs.pdf
LANL Newsletter, Roadrunner News Announcement @ www.lanl.gov/news/newsletter/091106.pdf
Optimizing Sweep3D @ hpc.pnl.gov/people/fabrizio/papers/ipdps07-sweep3d.pdf
Roadrunner Benchmarks @ www.c3.lanl.gov/pal/software/roadrunner.html

No comments:

Post a Comment