PARALLEL DATA LAB

Carnegie Mellon Database Application Catalog

The Carnegie Mellon Database Application Catalog (CMDBAC) is an on-line repository of open-source database applications that you can use for benchmarking and experimentation. The goal of this project is to provide ready-to-run real-world applications for researchers and practitioners that go beyond the standard benchmarks.

We built a crawler that finds applications hosted on public repositories (e.g., GitHub). We then created a framework that automatically learns how to deploy and execute an application inside a virtual machine sandbox. You can then safely download the application on your local machine and execute it to collect query traces and other metrics.

The CMDBAC currently contains over 1000 applications of varying complexity. We target Web applications based on popular programming frameworks because (1) they are easier to find and (2) we can automate the deployment process. We support applications that use the Django, Ruby on Rails, Drupal, Node.js, and Grails frameworks.

People

FACULTY

Andy Pavlo

GRAD STUDENTS

Dana Van Aken
Zeyuan Shang, Tsinghua University


More Information:

Acknowledgements

The work in this project is based on research supported in part by the DoE, under award number DE-FC02-06ER25767 (PDSI), by the NSF under grant CCF-1019104, and by the MSR-CMU Center for Computational Thinking.

We thank the members and companies of the PDL Consortium: Amazon, Google, Hitachi Ltd., Honda, Intel Corporation, IBM, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.