ABSTRACT
Carnegie
Mellon University Parallel Data Lab Ph.D. Dissertation CMU-PDL-06-109, August
2006.
Routing, Disjoint Paths, and Classification
Shuheng Zhou
Parallel Data Laboratory
School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
http://www.pdl.cmu.edu/
In this thesis, we study two classes of problems: routing and classification. Routing
problems include those that concern the tradeoff between routing table size and
short-path forwarding (Part I), and the classic Edge Disjoint Paths problem (Part
II). Both have applications in communication networks, especially in overlay network,
and in large and high-speed networks, such as optical networks. The third
part of this thesis concerns a type of classification problem that is motivated by a
computational biology problem, where it is desirable that a small amount of genotype
data from each individual is sufficient to classify individuals according to their
populations of origin.
In hierarchical routing, we obtain “near-optimal” routing table size and path
stretch through a randomized hierarchical decomposition scheme in the metric
space induced by a graph. We say that a metric ( X,d)
has doubling dimension
dim(X)
at most α if every set of diameter D can be covered by 2α sets of diameter
D/2. (A doubling metric is one whose doubling dimension dim(X)
is a
constant.) For a connected graph G, whose shortest path distances dG induce the
doubling metric (X, dG), we show how to perform (1+τ)
-stretch routing on G for
any 0< t ≤ 1 with routing tables of size at most (α/τ)O(α) logΔlogδ bits with only (α/τ)O(α) logΔ entries, where Δ is the diameter of G and δ is the maximum degree
of G. Hence, the number of routing table entries is just τ-O(1)logΔ for doubling
metrics.
The Edge Disjoint Paths (EDP) problem in undirected graphs refers to the following:
Given a graph G with n nodes and a set Τ of pairs of terminals, connect
as many terminal pairs as possible using paths that are mutually edge disjoint.
This leads to a variety of classic NP-complete problems, for which approximability is not well understood. We show a polylogarithmic approximation algorithm
for the undirected EDP problem in general graphs with a moderate restriction on
graph connectivity: we require the global minimum cut of G to be Ω(log5n). Previously,
constant or polylogarithmic approximation algorithms were known for trees
with parallel edges, expanders, grids and grid-like graphs, and, most recently, even-degree
planar graphs. These graphs either have special structure (e.g., they exclude
minors) or there are large numbers of short disjoint paths. Our algorithm extends
previous techniques in that it applies to graphs with high diameters and asymptotically
large minors.
In the classification problem, we are given a set of 2N diploid individuals from
population P1 and P2 (with no admixture), and a small amount of multilocus genotype
data from the same set of K loci for all 2N individuals, and we aim to partition
P1 and P2 perfectly. Each population Pa, where a∈{1,2}, is characterized by a
set of allele frequencies at each locus. In our model, given the population of origin
of each individual, the genotypes are assumed to be generated by drawing alleles
independently at random across the K loci, each from its own distribution. For example,
each SNP (or Single Nucleotide Polymorphism) has two alleles, which we
denote with bit 1 and bit 0 respectively. In addition, each locus contains two bits
(one from each parent) that are assumed to be two random draws from the same
Bernoulli distribution.
We use and , for all k = 1, ..., K
to denote frequency of an allele mapping to bit
1 at locus k in P1 and P2, respectively. We use γ =
( - )2 / K as the dissimilarity
measure between P1 and P2. We compute the number of loci K that we need to perform different tasks, versus N and γ, and prove several theorems. Ultimately,
we show that with probability
1-1/poly(N), given that K=
Ω(logN log logN / Nγ2) and
K=
Ω(logN /
γ), we can recognize the perfect partition (P1,P2) from among all other
balanced partitions of the 2N individuals. We proved this theorem for two cases:
either we are given two random draws for each attribute along each dimension, or
only one.
FULL THESIS: pdf / ps


©
2008.
Last updated
13 September, 2006
|