Thursday, May 3, 2018
TIME: 12:00 - 1:00 pm
PLACE: RMCIC - 4th floor (Panther Hollow Room)
SPEAKER: Ashish Motivala, Snowflake Computing
TITLE: Automatic Clustering at Snowflake
For partitioned tables, maintaining good clustering properties for frequently accessed dimensions is critical for partition pruning performance. Naive methods of clustering maintenance could be expensive, especially when the clustering dimensions are different from the dimensions with which the data is loaded. On the other hand, approximate clustering is cheaper to maintain while still resulting in good pruning performance. In this talk, I will present Snowflake's clustering capabilities, including our algorithm for incremental maintenance of approximate clustering of partitioned tables, as well as our infrastructure to perform such maintenance automatically. I will also cover some real-world problems we run into and our solutions.
Ashish Motivala has worked on databases for the last 12 years. Over that period he's built 3 different databases products from the ground up. He currently works at Snowflake Computing, a cloud-analytics SQL database, where he build core database features. Prior to that, he was a founding member of Oracle TimesTen distributed in-memory engine. He also built a document database as part of an internal Oracle venture. Ashish graduated with a Master in CS from Cornell University.
SEMINAR HOST: Andy Pavlo
SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/