PARALLEL DATA LAB

NoisePage

NoisePage is a relational database management system (DBMS) designed from the ground up for autonomous deployment. It uses integrated machine learning components to control its configuration, optimization, and tuning. The system will support automated physical database design (e.g., indexes, materialized views, sharding), knob configuration tuning, SQL tuning, and hardware capacity/scaling. Our research focuses on building the system components that support such self-driving operation with little to no human guidance. We seek to create a system that not only able to optimize the current workload, but also to predict future workload trends and prepare itself accordingly.

The goal of the NoisePage project is to build a self-driving DBMS that will continue to operate on its own for the long term.

Our plan is for NoisePage to support the most common database tuning techniques without requiring humans to determine the right way and proper time to deploy them. It will also enable new optimizations that are important for modern high-performance DBMSs, but which are not possible today because the complexity of managing these systems has surpassed human experts' abilities.

NoisePage is intended to be a viable, open source DBMS. Some key features include:

  • Postgres compatible wire-protocol, SQL, and catalogs.

  • Arrow compatible columnar storage.

  • Lock-free multi-version concurrency control.

  • Just-in-time query compilation using the LLVM.

  • Vectorized execution using relaxed-operator fusion (ROF).

  • Integrated machine learning components to support autonomous optimizations.

  • Lock-free Bw-Tree indexes.

  • 100% Open-Source (MIT License)

PeopLE

FACULTY

Andy Pavlo
Todd Mowry

GRAD STUDENTS

Lin Ma
Matt Butrovich
Prashanth Menon
Wan Shen Lim
Ziqi Dong
Joseph Koshakow
Arvind Sai Krishnan
Ricky Xu
Ling Zhang


Publications

  • Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in Database Management Systems. Ling Zhang, Matthew Butrovich, Tianyu Li, Yash Nannapanei, Andrew Pavlo, John Rollinson, Huanchen Zhang, Ambarish Balakumar, Daniel Biales, Ziqi Dong, Emmanuel Eppinger, Jordi Gonzalez, Wan Shen Lim, Jianqiao Liu, Lin Ma, Prashanth Menon, Soumil Mukherjee, Tanuj Nayak, Amadou Ngom, Jeff Niu, Deepayan Patra, Poojita Raj, Stephanie Wang, Wuwen Wang, Yao Yu, William Zhang. Conference on Innovative Data Systems Research (CIDR) 2021. January 11-15, 2021. Virtual Event.
    Abstract / PDF [352K]

  • Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats. T. Li, M. Butrovich, A. Ngom, W. S. Lim, W. McKinney, and A. Pavlo. Proceedings of the VLDB Endowment, Vol. 14, No. 4 ISSN 2150-8097, pp. 534-546, 2020.
    Abstract / PDF [633K]

  • Permutable Compiled Queries: Dynamically Adapting Compiled Queries without Recompiling. Prashanth Menon, Amadou Ngom, Lin Ma, Todd C. Mowry, Andrew Pavlo. Proceedings of the VLDB Endowment, vol. 14, iss. 2, pages. 101—113, October 2020.
    Abstract / PDF [904K]

  • External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, Ruslan Salakhutdinov. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 42(2): 32-46 (2019).
    Abstract / PDF [555K]

  • Query-based Workload Forecasting for Self-Driving Database Management Systems. Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, Geoffrey J. Gordon. SIGMOD/PODS '18 International Conference on Management of Data, Houston, TX, USA, June 10 - 15, 2018.
    Abstract / PDF [1.25M]

  • Building a Bw-Tree Takes More Than Just Buzz Words. Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, David G. Andersen. SIGMOD’18, June 10–15, 2018, Houston, TX, USA.
    Abstract / PDF [2.2M]

  • Self-Driving Database Management Systems. A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. In CIDR 2017, Conference on Innovative Data Systems Research. January 8-11, 2017, Chaminade, CA.
    Abstract / PDF [680K]

  • Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last. Prashanth Menon, Todd C. Mowry & Andrew Pavlo. Proceedings of the VLDB Endowment, Vol. 11, No. 1, 2017.
    Abstact / PDF [970K]

  • An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, Andrew Pavlo. Proceedings of the VLDB Endowment, vol. 10, iss. 7, pages. 781—792, March 2017.
    Abstract / PDF [660K]

CODE

Acknowledgements

We thank the members and companies of the PDL Consortium: Amazon, Google, Hitachi Ltd., Honda, Intel Corporation, IBM, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.