DATE: Thursday, February 5, 2015
TIME: 12:00 - 1:00 pm
PLACE: RMCIC 4th Floor Panther Hollow Room

SPEAKER: Chris Jermaine, Rice University

TITLE: Large-Scale Machine Learning with the SimSQL System

ABSTRACT:
In this talk, I'll describe the SimSQL system, which is a platform for writing and executing statistical codes over large data sets, particularly for machine learning applications. Codes that run on SimSQL can be written in a very high-level, declarative language called Buds. A Buds program looks a lot like a mathematical specification of an algorithm, and statistical codes written in Buds are often just a few lines long.

At its heart, SimSQL is really a relational database system, and like other relational systems, SimSQL is designed to support data independence. That is, a single declarative code for a particular statistical inference problem can be used regardless of data set size, compute hardware, and physical data storage and distribution across machines. One concern is that a platform supporting data independence will not perform well. But we've done extensive experimentation, and have found that SimSQL performs as well as other competitive platforms that support writing and executing machine learning codes for large data sets.

BIO:
Chris Jermaine is an associate professor of computer science at Rice University. He is the recipient of an Alfred P. Sloan Foundation Research Fellowship, a National Science Foundation CAREER award, and an ACM SIGMOD Best Paper Award. In his spare time, Chris enjoys outdoor activities such as hiking, climbing, and whitewater boating. In one particular exploit, Chris and his wife floated a whitewater raft (home-made from scratch using a sewing machine, glue, and plastic) over 100 miles down the Nizina River (and beyond) in Alaska.

VISITOR HOST: Andy Pavlo

VISITOR COORDINATOR: Samantha Dinardo, sdinardo@cs.cmu.edu, 8-7660

SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/

*partially funded by

A joint seminar with MCDS.