Storage architecture is ready to change as a result of the synergy between five overriding factors:
Traditional distributed filesystem workloads are dominated by small random accesses to small files whose sizes are slowly growing with time. In contrast, new workloads are much more I/O-bound, including data types such as video and audio, and applications such as data mining of retail transactions, medical records, or telecommunication call records.
Disk bandwidth is increasing at rate
of 40% per year. High transfer rates have increased pressure on the physical
and electrical design of drive busses, dramatically reducing maximum bus
length. At the same time, people are building systems of clustered computers
with shared storage. Therefore, the storage industry is beginning to encapsulate
drive communication over Fibrechannel, a serial, switched, packet-based
peripheral network that supports long cable lengths, more ports, and more
bandwidth. NASD will evolve the SCSI command set that is currently being
encapsulated over Fibrechannel to take full advantage of the switched-network
technology for both higher bandwidth and increased flexibility.
The increasing transistor density in inexpensive ASIC technology has lowered disk drive cost and increased performance by integrating sophisticated special-purpose functional units into a small number of chips. Figure 1 shows the block diagram for the ASIC at the heart of Quantum’s Trident drive. When drive ASIC technology advances from 0.68 micron CMOS to 0.35 micron CMOS, they will be able to insert a 200 MHz StrongARM microcontroller, leaving space equivalent to 100,000 gates for functions such as onchip DRAM or cryptographic support. While this may seem like a major jump, Siemen’s TriCore integrated microcontroller and ASIC architecture promises to deliver a 100 MHz, 3-way issue, 32-bit datapath with up to 2 MB of onchip DRAM and customer defined logic in 1998.
Figure 1: Quantum’s Trident disk drive features the ASIC on the left (a). Integrated onto this chip in four independent clock domains are 10 function units with a total of about 110,000 logic gates and a 3 KB SRAM: a disk formatter, a SCSI controller, ECC detection, ECC correction, spindle motor control, a servo signal processor and its SRAM, a servo data formatter (spoke), a DRAM controller, and a microprocessor port connected to a Motorola 68000 class processor. By advancing to the next higher ASIC density, this same die area could also accommodate a 200 MHz StrongARM microcontroller and still have space left over for DRAM or additional functional units such as cryptographic or network accelerators.
Scalable computing is increasingly based on clusters of workstations. In contrast to the special-purpose, highly reliable, low-latency interconnects of massively parallel processors such as the SP2, Paragon, and Cosmic Cube, clusters typically use Internet protocols over commodity LAN routers and switches. To make clusters effective, low-latency network protocols and user-level access to network adapters have been proposed, and a new adapter card interface, the Virtual Interface Architecture, is being standardized. These developments continue to narrow the gap between the channel properties of peripheral interconnects and the network properties of client interconnects and make Fibrechannel and Gigabit Ethernet look more alike than different.
Figure 2: Cost model for the traditional server architecture. In this simple model, a machine serves a set of disks to clients using a set of disk (wide Ultra and Ultra2 SCSI) and network (Fast and Gigabit Ethernet) interfaces. Using peak bandwidths and neglecting host CPU and memory bottlenecks, we estimate the server cost overhead at maximum bandwidth as the sum of the machine cost and the costs of sufficient numbers of interfaces to transfer the disks’ aggregate bandwidth divided by the total cost of the disks. While the prices are probably already out of date, the basic problem of a high server overhead is likely to remain. We report pairs of costs and bandwidth estimates. On the left, we show values for a low cost system built from high-volume components. On the right, we show values for a high-performance reliable system built from components recommended for mid-range and enterprise servers.