Measurement, Modelling, and Analysis of Data Access Patterns

Contacts: Prof. M. Satyanarayanan, Lily Mummert, Maria Ebling

Goals

Accurate characterization of data usage in real computer systems is an
important component of the Storage and Computer Systems Integration thrust
area. Specifically, our goal in this work is to

o obtain a clear understanding of data usage patterns in distributed Unix
environments,

o obtain insights into a number of fundamental research questions pertaining
to file access in distributed systems,

o develop a body of techniques for efficient collection, postprocessing, and
indexing of file reference traces, and

o influence the other projects in this thrust area of the DSSC.

In the course of pursuing these goals, we have made a number of technology
advances which are important contributions in themselves:

o a validated trace format that has been demonstrated to be of sufficient
generality for a large number of important analyses,

o software for the generation and collection of file reference traces,

o a collection of comprehensive, high-quality long-term file reference traces,
and 

o software for synthetic generation of file references both as workloads and
as benchmarks.

We have focused our efforts on distributed Unix file systems, since that is
the dominant form of data usage in our environment. We look forward to
collaborating with our industrial partners in conducting similar analyses in
environments where data usage differs considerably.

Accomplishments.

In the three years from January 1990 to December 1992, we have met or exceeded
all our original goals. Specifically, we have developed a set of tools for
obtaining high-quality traces, used these tools to collect long-term traces,
and analyzed a number of critical questions about distributed file systems. We
have designed and made substantial progress toward the implementation of a
synthetic file reference generator, and expect to complete this effort by late
1993. In the sections below, we describe each of these accomplishments in more
detail.

Tracing Tools

We have built dfstrace, a system to collect long-term file reference data at
the system call level in a distributed Unix workstation environment. The
design of dfstrace is unique in that it pays particular attention to
efficiency, portability and the logistics of long-term data collection in such
an environment. The major components of dfstrace are a set of kernel hooks, a
kernel buffer mechanism, a user-level agent, a set of collection servers, and
a post-processing library.

Kernel Hooks

Trace data is generated by client workstations running kernels instrumented at
the system call level. We have added hooks in the file system call code for
transparency to users and applications. Relevant pieces of data are passed to
a logging routine, which creates a record and writes it into an circular
memory buffer. The agent extracts blocks of data from the buffer through a
simple device driver interface. 

Collection Machinery

The data is extracted by a user-level process, or agent, buffered locally in
memory, and then sent to one of a small number of data collection servers, or
collectors. A collector stages the data on disk; in the background the data is
sent to tape. The data is post-processed at a later time to obtain a usable
set of traces for analysis. Multiple servers may be used to balance load and
maintain availability. All of the communication intelligence is in user code.
The agent and collector do not interpret the data, thus their operation is
independent from exactly what data is being collected. 

Indexing of Traces

As the body of data we have collected grows larger, summary information of
various kinds for each trace becomes necessary, so that a user confronted with
20 GB of this data has some idea where to begin. We have built an on-line
index for the traces that contains for each trace information like composition
by system calls, access characteristics, and activity levels. Although the
traces themselves are archived on tape, the on-line index makes it relatively
easy to identify which specific tape or tapes to one is interested in. 

Post-Processing Library

The goals of the post-processing library are to provide a convenient
programmer interface to the traces, and to implement common operations. The
underlying structure of the trace is hidden behind a simple interface. Traces
may be filtered in various ways, such as by opcode or user, and the library
will present the user only with the records that fit the specification. It is
structured to accommodate traces of various formats (other researchers,
various versions of ours), while maintaining a consistent interface to the
programmer. It maintains a good deal of bookkeeping on the trace, such as
building and tracking process trees, so that groups of processes may be
studied in aggregate, and simulating the kernel open file table. 

Trace-based Analysis

The trace data has already been valuable in determining the disk requirements
for portable machines disconnected from the network, in estimating the
resources required to resolve inconsistent replicas of files arising from
network partitions, and in analyzing the geometry of disks. Over the summer of
1991, an undergraduate student funded by NSF REU conducted a comparative study
of our data with earlier data gathered by other researchers.

Disk Requirements for Portable Computers

To obtain an understanding of the cache size requirements for disconnected
operation of portable computers, we used the traces we had collected in
simulations of the Coda cache manager. From our data it appears that a disk of
50-60MB should be adequate for operating disconnected for a typical workday.
Of course, user activity that is drastically different from what was recorded
in our traces could produce significantly different results. The actual disk
size needed for disconnected operation has to be larger, since both the
explicit and implicit sources of hoarding information are imperfect. Full
details of this analysis have been reported in [Kistler92].

Log Space Requirements for Directory Resolution

We have used the traces to estimate the space requirements for log-based
directory resolution. Full details of this work have been reported in
[Kumar93].
Since a log grows linearly with work done during partition, any realistic
estimate of log size has to be derived from empirical data. The traces were
used as input to a simulation of the logging component of the resolution
subsystem. The simulator assumes that all activity in a trace occurs while
partitioned, and maintains a history of log growth at 15-minute intervals for
each volume in the system. At the end of simulation, the average and peak log
growth rates for each volume can be obtained from its history. 
Our analysis shows that long-term log growth is relatively low, averaging
about 94 bytes per hour. Focusing only on long-term average log growth rate
can be misleading, since user activity is often bursty. To estimate the log
length induced by peak activity, we examined the statistical distribution of
hourly log growth rates for all volumes in our simulation. Over 94% of all
data points observed were less than 1KB, and over 99.5% are less than 10KB.
Since hourly growth is less than 10KB in 99.5% of our data points, and since
an hour-long partition could have straddled two consecutive hours of peak
activity, we infer that a 20KB log will be adequate for most hour-long
partitions in our environment. More generally, a partition of N hours could
have straddled N+1 consecutive hours of peak activity. Hence a log of 10(N+1)
KB would be necessary. If a Coda server were to hold 100 volumes (a typical
number at AFS installations), the total log space needed on the server would
be (N+1) MB.

Analyzing Disk Geometry

We have extended the high-level traces described above to obtain low-level I/O
traces and have written a program that extracts disk geometry and performance
characteristics for a wide variety of disks. This kind of tool is valuable for
all measurement studies that employ disks because it allows the performance of
the disk to be diagnosed independent of the application and operating system.
The kind of information that this tool can produce includes sector
organization, defect spare locations, controller overhead, zone layout in
variable sector per track disks, track and cylinder skewing, etc. The tool
works by reading or writing sectors in a fixed pattern (such as 0, 0, 0, 1, 0,
2, 0, 3, 0, 4, 0, 5, ...) while timing the response from the disk. 

Synthetic Reference Generation

File traces, while accurately characterizing the context they were collected
in, suffer from a number of limitations. First, there is no obvious way to
scale a trace so that it represents a workload of higher or lower intensity.
Second, traces tend to be voluminous and often have to be stored off-line on
tapes because of disk storage limitations. This is especially true of
long-term traces. Third, traces can only be made of actually observed
workloads. There is no way to perturb a trace to represent a slightly
different workload, or a radically different workload. Thus the range of
"what if" questions that can be answered with a trace are limited. 
To address these limitations, we have designed a synthetic reference
generator, SynRGen. SynRGen provides a simple and extensible mechanism for
modelling a wide variety of usage environments. Both locality of reference and
data sharing among users can be modelled in SynRGen. Besides its use in stress
testing a file system, SynRGen can be used as a parameterized benchmark for
evaluating local or distributed Unix file systems. SynRGen is easily portable
to any file system implementing Unix semantics and to a variety of hardware
architectures and Unix platforms. 

SynRGen builds a program that stochastically combines micromodels of file
reference sequences. The micromodels consist of actual code built by a modeler
by observing distinctive signatures of applications in actual file reference
traces. The input to SynRGen for stochastic combination of micromodels is in
the form of configuration files. We use three different types of files:
system, volume, and user class. The system configuration file describes an
entire usage environment. The volume configuration file describes the physical
characteristics of a particular type of volume, and the user class
configuration file describes the behavior of a particular class of users. The
mksynrgen program parses the system configuration file and builds a shell
script. The mkvol program parses the volume configuration files. The mkclass
program parses the user class configuration file and produces a C program
which models the user behaviors described in the file. 

Modelling Locality

Locality of reference occurs in both time and space. The principle of temporal
locality states that recently accessed data is likely to be accessed again in
the near future while the principle of spatial locality states that data near
recently accessed data is likely to be referenced soon. Locality of reference
to file data can be further broken into that which occurs within a file and
that which occurs between files. 
SynRGen allows the experimenter to incorporate arbitrary locality behavior by
providing code that models such behavior. In the case of temporal locality,
SynRGen supports modelling of locality across files but not within a file.
There is rarely a need for the latter, since the virtual memory system
typically filters out such locality from the file system. SynRGen also allows
modelling spatial locality of reference within a file. Spatial locality across
files is not modelled, since notion is itself not well-formed. 

Use of SynRGen

SynRGen may be used at three different levels. Each level requires more
knowledge about SynRGen and about the environment being modelled. The first
level requires little knowledge about either. The experimenter simply runs an
instance of SynRGen provided in the distribution. At this level, the
experimenter may vary only the run-time values of parameters in the
configuration files. The second level requires more knowledge about SynRGen's
configuration files and behaviors; the experimenter would create a
configuration file which more closely models a particular environment using
the user behaviors provided in the distribution. The experimenter might change
the probability distribution function of either the volume choices or the
action choices. The experimenter should have detailed knowledge of the
environment being modelled. At the third level, a modeler could model a new
user behavior or a new type of volume. This level requires the most knowledge
about SynRGen and detailed information about the user behavior or the new type
of volume is imperative. 

Plans

Our efforts for 1993-94 will focus on bringing our work on modelling,
measurement, and analysis of file usage to an end, and to use the tools
developed in this project in new work on evaluating and improving the use of
prefetching for availability in disconnected and weakly-connected
environments. Specifically, we plan to complete our implementation of SynRGen
and to build a collection of micromodels for it. We also plan to use it for
workload generation in real file systems.