The TIP Application Suite

    [ TIP Overview | TIP Cost-Benefit Analysis | SpecHint | Publications ]

    • Davidson: computational physics
    • XDataSlice: scientific visualization
    • Sphinx: speech recognition
    • Agrep: text search
    • Gnuld: object code linker
    • Postgres: relational database


    MCHF Davidson Algorithm

    The Multi-Configuration Hartree-Fock, MCHF, is a suite of computational-physics programs which we obtained from Vanderbilt University where they are used for atomic-physics calculations. The Davidson algorithm is an element of the suite that computes, by successive refinement, the extreme eigenvalue-eigenvector pairs of a large, sparse, real, symmetric matrix stored on disk. In our test, the size of this matrix is 16.3 MB.

    The Davidson algorithm iteratively improves its estimate of the extreme eigenpairs by computing the extreme eigenpairs of a much smaller, derived matrix. Each iteration computes a new derived matrix by a matrix-vector multiplication involving the large, on-disk matrix. Thus, the algorithm repeatedly accesses the same large file sequentially. Annotating this code to give hints was straightforward. At the start of each iteration, the Davidson algorithm discloses the whole-file, sequential read anticipated in the next iteration.

    XDataSlice

    XDataSlice (XDS) is an interactive scientific visualization tool developed at the National Center for Supercomputer Applications at the University of Illinois. Among other features, XDS lets scientists view arbitrary planar slices through their 3-dimensional data with a false color mapping. The datasets may originate from a broad range of applications such as airflow simulations, pollution modelling, or magnetic resonance imaging, and tend to be very large.

    It is often assumed that because disks are so slow, good performance is only possible when data is in main memory. Thus, many applications, including XDS, require that the entire dataset reside in memory. Because memory is still expensive, the amount available often constrains scientists who would like to work with higher resolution images and therefore larger datasets. Informed prefetching invalidates the slow-disk assumption and makes out-of-core computing practical, even for interactive applications. To demonstrate this, we added an out-of-core capability to XDS. To render a slice through an in-core dataset, XDS iteratively determines which data point maps to the next pixel, reads the datum from memory, applies false coloring, and writes the pixel in the output pixel array. To render a slice from an out-of-core dataset, XDS splits this loop in two. Both to manage its internal cache and to generate hints, XDS first maps all of the pixels to data-point coordinates and stores the mappings in an array. Having determined which data blocks will be needed to render the current slice, XDS ejects unneeded blocks from its cache, gives hints to TIP, and reads the needed blocks from disk. In the second half of the split loop, XDS reads the cached pixel mappings, reads the corresponding data from the cached blocks, and applies the false coloring.

    Our test dataset consists of 5123 32-bit floating point values requiring 512 MB of disk storage. The dataset is organized into 8 KB blocks of 16x16x8 data points and is stored on the disk in Z-major order. Our test renders 25 random slices through the dataset.

    Sphinx

    Sphinx is a high-quality, speaker-independent, continuous-voice, speech-recognition system. In our experiments, Sphinx is recognizing an 18-second recording commonly used in Sphinx regression testing.

    Sphinx represents acoustics with Hidden Markov Models and uses a Viterbi beam search to prune unpromising word combinations from these models. To achieve higher accuracy, Sphinx uses a language model to effect a second level of pruning. The language model is a table of the conditional probability of word-pairs and word-triples. At the end of each 10 ms acoustical frame, the second-level pruner is presented with the words likely to have ended in that frame. For each of these potential words, the probability of it being recognized is conditioned by the probability of it occurring in a triple with the two most recently recognized words, or occurring in a pair with the most recently recognized word when there is no entry in the language model for the current triple. To further improve accuracy, Sphinx makes three similar passes through the search data structure, each time restricting the language model based on the results of the previous pass.

    Sphinx, like XDS, came to us as an in-core only system. Since it was commonly used with a dictionary containing 60,000 words, the language model was several hundred megabytes in size. With the addition of its internal caches and search data structures, virtual-memory paging occurs even on a machine with 512 MB of memory. We modified Sphinx to fetch from disk the language model's word-pairs and word-triples as needed. This enables Sphinx with TIP to run on our 128 MB test machine 90% as fast as on a 512 MB machine.

    We additionally modified Sphinx to disclose the word-pairs and word-triples that will be needed to evaluate each of the potential words offered at the end of each frame. Because the language model is sparsely populated, at the end of each frame there are about 100 byte ranges that must be consulted, of which all but a few are in Sphinx's internal cache. However, there is a high variance on the number of pairs and triples consulted and fetched, so storage parallelism is often employed.

    Agrep

    Agrep, a variant of grep, was written by Wu and Manber at the University of Arizona. It is a full-text pattern matching program that allows errors. Invoked in its simplest form, it opens the files specified on its command line one at a time, in argument order, and reads each sequentially.

    Since the arguments to Agrep completely determine the files it will access, Agrep can issue hints for all accesses upon invocation. Agrep simply loops through the argument list and informs the file system of the files it will read. When searching data collections such as software header files or mail messages, hints from Agrep frequently specify hundreds of files too small to benefit from history-based readahead. In such cases, informed prefetching has the advantage of being able to prefetch across files and not just within a single file.

    In our benchmark, Agrep searches 1349 kernel source files occupying 2922 disk blocks for a simple string that does not occur in any of the files.

    Gnuld

    Gnuld version 2.5.2 is the Free Software Foundation's object code linker which supports ECOFF, the default object file format under OSF/1. Gnuld performs many passes over input object files to produce the output linked executable. In the first pass, Gnuld reads each file's primary header, a secondary header, and its symbol and string tables. Hints for the primary header reads are easily given by replicating the loop that opens input files. The read of the secondary header, whose location is data dependent, is not hinted. Its contents provide the location and size of the symbol and string tables for that file. A loop splitting technique similar to that in XDataSlice is used to hint the symbol and string table reads. After verifying that it has all the data needed to produce a fully linked executable, Gnuld makes a pass over the object files to read and process debugging symbol information. This involves up to nine small, non-sequential reads from each file. Fortunately, the previously read symbol tables determine the addresses of these accesses, so Gnuld loops through these tables to generate hints for its second pass.

    During its second pass, Gnuld constructs up to five shuffle lists which specify where in the executable file object-file debug ging information should be copied. When the second pass completes, Gnuld finalizes the link order of the input files, and thus the organization of non-debugging ECOFF segments in the executable file. Gnuld uses this order information and the shuffle lists to give hints for the final passes.

    Our test links the 562 object files of our TIP kernel. These objects file comprise approximately 64 MB, and produce an 8.8 MB kernel.

    Postgres

    Postgres version 4.2 is an extensible, object-oriented relational database system from the University of California at Berkeley. In our test, Postgres executes a join of two relations. The outer relation contains 20,000 unindexed tuples (3.2 MB) while the inner relation has 200,000 tuples (32 MB) and is indexed (5 MB). We run two cases. In the first, 20% of the outer relation tuples find a match in the inner relation. In the second, 80% find a match. One output tuple is written sequentially for every tuple match.

    To perform the join, Postgres reads the outer relation sequentially. For each outer tuple, Postgres checks the inner relation's index for a matching inner tuple and, if there is one, reads that tuple from the inner relation. From the perspective of storage, accesses to the inner relation and its index are random, defeating sequential readahead, and have poor locality, defeating caching. Thus, most of these inner-relation accesses incur the full latency of a disk read.

    To disclose these inner-relation accesses, we employ a loop-splitting technique similar to that used in XDS. In the precomputation phase, Postgres reads the outer relation (disclosing its sequential access), looks up each outer-relation tuple address in the index (unhinted), and stores the addresses in an array. Postgres then discloses these precomputed block addresses to TIP. In the second pass, Postgres rereads the outer relation but skips the index lookup and instead directly reads the inner-relation tuple whose address is stored in the array.


    PDL Home TIP Home

    © 2008.
    Last updated 17 November, 2004