DATE: Friday, October 5, 2001
TIME: Noon - 1 pm
PLACE: Wean Hall 7220

Vladimir I. Zadorozhny
University of Pittsburgh

Efficient Query Processing in a Mediator for Web Data Sources

I consider a mediator architecture for querying multiple Web data sources in a wide area environment. One objective of this architecture is to support declarative database-like queries to noisy Web sources with limited query capability. I present a two-phase Web query optimizer that uses a capability-based pre-optimizer and an extended relational optimizer. The pre-optimizer generates a pre-plan for a mediator query. The pre-plan identifies Web access patterns relevant to the mediator query, as well as restrictions imposed by the capabilities of the Web sources. A relational optimizer utilizes the knowledge in the pre-plan in producing a good query execution plan. I will show that the choice of Web access patterns strongly impacts the cost of the query execution plan, and consider cost-based heuristics that the optimizer should use to make a good choice.

Finally I present a novel optimization strategy to meet performance targets for queries in a noisy wide area environment. Using access cost distributions for Web sources, the optimizer determines a cost-delay utility for a query plan. The optimizer behavior can be more optimistic, where it ignores the expected delay of accessing Web sources, or it can be conservative and consider this delay.

Vladimir Zadorozhny is an Assistant Professor in Department of Information Science and Telecommunications, School of Information Sciences, University of Pittsburgh. He received his Ph.D. in 1993 from the Institute for Problems of Informatics, Russian Academy of Sciences in Moscow. Before coming to US he was a Principal Research Fellow in the Institute of System Programming, Russian Academy of Sciences. Since May 1998 he worked as a Research Associate, and then Research Scientist in the University of Maryland Institute for Advanced Computer Studies at College Park. He joined University of Pittsburgh in September 2001. Vladimir's research interests include scalable architectures for wide-area environments with heterogeneous information servers, Web-based information systems, query optimization in distributed databases, semantic interoperability in heterogeneous network environments,
distributed object systems and object metamodels.

