ShardFS: Replicated Directories with Sharded Files

The rapid growth of cloud storage systems calls for fast and scalable namespace processing. ShardFS scales distributed file system metadata performance by fully replicating directory entries and permissions across servers (directory lookup state), insuring each operation on a server does not need to obtain locks from multiple servers. Every metadata server contains a complete replica of this namespace information (but not the attributes of any file and not necessarily replicas of all attributes of directories), so a single RPC to the appropriate metadata server will be sufficient to complete pathname resolution without additional RPCs. File metadata is distributed by a sharding function using pathname as input. Almost all operations on file metadata or non-replicated directory metadata such as timestamps are single RPC operations. The scaling benefit that these operations get from namespace replication is the primary motivation for this technique. ShardFS reduces replication cost by specializing popular distributed transaction implementation to categories of metadata operations.

Namespace distribution in ShardFS: replicated directory lookup states with
other metadata sharded.



Garth Gibson


Lin Xiao

Software Release

The code is open sourced under PDL license, which is a BSD-style three clause license.


We thank the members and companies of the PDL Consortium: Amazon, Google, Hitachi Ltd., Honda, Intel Corporation, IBM, Jane Street, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.