DATE: Thursday, March 22, 2012
TIME: Noon to 1 pm
PLACE: ISTC Panther Hollow Room

SPEAKER: Michael Franklin, UC Berkeley

TITLE: AMPLab: Making Sense at Scale with Algorithms, Machines and People

ABSTRACT:
As organizations collect more and more data, they require analytics systems that can scale with data volumes, but the challenge of "Big Data" analytics is more than simply one of data size. Rather, as the scope of data analysis widens, issues such as data integration, data cleaning, and dealing with ambiguity and incompleteness in both queries and the underlying data are exacerbated. This combination of size and complexity make it difficult for users to obtain answers to their data-driven questions within their time, cost and quality constraints. To address this problem, a group of researchers from machine learning, systems, databases, and networking at Berkeley started a new five-year research effort called the AMPLab, where AMP stands for "Algorithms, Machines, and People". The project is embarking on a rethinking of data analytics to seamlessly incorporate scalable machine learning, warehouse-scale computing and human computation in a way that dynamically optimizes this time/cost/quality tradeoff. We are developing a new data analytics stack that implements this vision. AMPLab's research is supported in part by 18 leading technology companies, including founding sponsors Google and SAP. We work with these sponsors and with university-based application partners in data-rich areas such as participatory sensing, urban planning, cancer genomics, and network security to evaluate and validate our technologies. In this talk, I will give an overview of the broader AMPLab research agenda, and then focus on some of our early results in developing scalable programming frameworks for data analysis and crowdsourced query processing.

BIO:
Michael Franklin is a Professor of Computer Science at UC Berkeley, focusing on new approaches for data management and data analysis. His recent research projects have focused on data stream processing and continuous analytics, scalable query processing, large-scale sensing environments, data integration, and hybrid human/computer data processing systems. At Berkeley he directs the Algorithms, Machines and People Laboratory (AMPLab), a cross-disciplinary collaboration addressing the Big Data analytics problem. He is founder and CTO of Truviso, Inc. a real-time data analytics company that enables customers to quickly make sense of diverse, high-speed, continuous streams of information. He is a Fellow of the Association for Computing Machinery, and a recipient of the National Science Foundation CAREER award, the ACM SIGMOD "Test of Time" award, and the 2011 Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. He is currently serving as a committee member on the US National Academy of Science study on Analysis of Massive Data. He received his Ph.D. from the University of Wisconsin in 1993.

VISITOR HOST: Anthony Tomasic, ISR

VISITOR COORDINATOR: Jennifer Lucas (jmlucas@andrew.cmu.edu)

SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/