Speaker: Forest Godfrey, SGI
Date: April 13, 2000
Cluster in a Box - How to build a 512 processor Linux machine
The talk will describe scaling problems in building large single-system-image machines, how some of these problems are solved in Irix, and why the same solutions may not apply to Linux. It will then cover some of the problems with clustering and one of the SGI solutions - partitioning on SN1/Intel and beyond including both message passing (xpc + TCP/IP over the internal memory interconnect) and shared memory between partitions. Finally, the talk will discuss RAS (Reliability, Availability, and Servicability) issues inherent in large systems such as recovery from various types of memory errors.
Forest Godfrey, a 1999 graduate of Carnegie Mellon University, is a member of the OS Scalability group at SGI. Forest works on Irix and Linux partitioning as well as software recovery from hardware failure. As a student at CMU he worked on a project to prototype distributed shared memory and process migration across a Linux cluster. His areas of expertise include operating systems and large system design. Outside of computers, Forest is also a professional theater technician.