Parallel Data Laboratory

PDL Talk Series

JuLY 22, 2021

TIME: 12:00 noon - to approximately 1:00 pm EDT
PLACE: Virtual - a zoom link will be emailed closer to the seminar

SPEAKER: Prashanth Menon, Sr. Software Engineer
Databricks

On Building Robustness into Compilation-Based Main-Memory Database Query Engines
Relational database management systems (DBMS) are the bedrock upon which modern data processing applications are assembled. Critical to ensuring low-latency queries is the efficiency of the DBMSs query processor. Just-in-time (JIT) query compilation is a popular technique to improve analytical query processing performance. However, a compiled query cannot overcome poor choices made by the DBMSs optimizer. Garbage in, garbage out. Poor query plans arise for many reasons and although previous work has explored techniques to compensate for inadequate plans, none work in DBMSs that rely on compiling queries.

In this talk, I will present multiple effective, practical, and complementary techniques to build runtime adaptivity into compilation-based engines with negligible overhead. First, I will propose a method that blends two otherwise disparate query processing approaches (compilation and vectorization) into one engine. Next, I will present a framework that builds upon our previous work to allow the DBMS to modify compiled queries without recompiling the plan or generating code speculatively. This technique enables larger groups of operators in a query to coordinate their optimization process. Finally, I will present a method that decomposes query plans into fragments that can be compiled and executed independently. This not only reduces compilation overhead but enables the DBMS to learn properties about data processed in an earlier phase of the query to hyper-optimize the code it generates for later phases.

Collectively, these techniques enable any compilation-based DBMS to achieve dynamic runtime robustness without succumbing to any of its overheads.

BIO: Prashanth is a Senior Software Engineer at Databricks where he is designing the next generation execution infrastructure supporting the large-scale and diverse workloads in the Spark ecosystem. Prior to joining Databricks, Prashanth completed his PhD at CMU in 2021 working with Andy Pavlo and Todd Mowry on databases.

CONTACTS

, PDL Co-Director
RMCIC 2311

, PDL Co-Director
(412) 268-3064
GHC 9109

Executive Director, Parallel Data Lab
VOICE: (412) 268-5485

PDL Administrative Manager
VOICE: (412) 268-6716