Systematic Dynamic Race Detection in Kernel Space
Contact: Ben Blum, Garth Gibson
Race conditions are notoriously difficult to debug. Because of their nondeterministic nature, they frequently do not manifest at all during testing, and when they do manifest, it is difficult to reproduce them reliably enough to collect enough information to help debugging.
In kernel space, race condition debugging becomes even more difficult. Many aspects of the concurrency implementation itself are part of the system being tested. While user-space testing frameworks rely on the underlying kernel to provide common concurrency abstractions which the framework can use to drive the test, systematic exploration in the kernel itself necessarily exists underneath such abstractions.
Landslide is a tool for performing systematic exploration in kernel space with a focus on Pebbles kernels written for 15-410. In the past, we have met with students of 15-410 during the final weeks of the kernel project and guided them in using Landslide on their own kernels, resulting in several concurrency bugs of varying severity and subtlety being found and fixed.
With Landslide, we see testing a kernel as a process of manipulating test parameters in two ways: first, in the choice of test case, and second, in the user's configuration of Landslide to express which parts of the kernel are "interesting" and which are irrelevant.
In future work, we are considering modifying Landslide's architecture to be applicable on production kernels such as Linux, with potential focuses on analysing device drivers and on replacing simulation with virtualisation for performance. We are also considering focusing on Landslide's applicability as an educational tool.
- Landslide: Systematic Dynamic Race Detection in Kernel Space. Ben Blum. Carnegie Mellon University School of Computer Science MS Thesis CMU-CS-12-118. May 2012.
Abstract / PDF [1.7M]
This research was sponsored by the U.S. Army Research Office under grant number W911NF0910273.
We thank the members and companies of the PDL Consortium: Actifio, American Power Conversion, EMC Corporation, Facebook, Google, Hewlett-Packard Labs, Hitachi, Huawei Technologies Co., Intel Corporation, Microsoft Research, NEC Laboratories, NetApp, Inc., Oracle Corporation, Panasas, Samsung Information Systems America, Seagate Technology, Symantec Corporation, VMware, Inc., and Western Digital for their interest, insights, feedback, and support.