PARALLEL DATA LAB 

PDL Abstract

Better Caching in Search Advertising Systems with Rapid Refresh Predictions

2018 World Wide Web Conference, Lyon, France — April 23 - 27, 2018.

Conglong Li, David G. Andersen, Qiang Fu†, Sameh Elnikety†, Yuxiong He†

Carnegie Mellon University
†Microsoft

http://www.pdl.cmu.edu/

To maximize profit and connect users to relevant products and services, search advertising systems use sophisticated machine learning algorithms to estimate the revenue expectations of thousands of matching ad listings per query. These machine learning computations constitute a substantial part of the operating cost, e.g., 10% to 30% of the total gross revenues. It is desirable to cache and reuse previous computation results to reduce this cost, but caching introduces approximation which comes with potential revenue loss. To maximize cost savings while minimizing the overall revenue impact, an intelligent refresh policy is required to decide when to refresh the cached computation results. The state-of-the-art manually-tuned refresh heuristic uses revenue history to assign different refresh frequencies. Using the gradient boosting regression tree algorithm with well selected features, we introduce a rapid prediction framework that provides refresh decisions at higher accuracy compared to the heuristic. This enables us to build a prediction-based refresh policy and a cache achieving higher profit without manual parameter tuning. Simulations conducted on the logs from a major commercial search advertising system show that our proposed cache design reduces the negative revenue impact (0.07×), and improves the cost savings (1.41×) and the net profit (1.50∼1.70×) compared to the state-of-the-art manually-tuned heuristic-based cache design.

FULL PAPER: pdf