DATE: Thursday, February 6, 2020
TIME: 12:00 - 1:00 pm
PLACE: RMCIC Panther Hollow Conference Room, 4th Floor
SPEAKER: Carlo Curino, Principle Scientist Lead, Microsoft
TITLE: Cloudy with High Chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, inteligent feedback loops in largescale sysops, manufacturing and autonomous vehicle management, complex financial predictions, just to name a few. Meanwhile, as the value of data is increasingly recognized and monetized, concerns about securing valuable data and risks to individual privacy have been growing. Consequently, rigorous data management has emerged as a key requirement in enterprise settings. How will these trends (ML growing popularity, and stricter data governance) intersect? What are the unmet requirements for applying ML in enterprise settings? What are the technical challenges for the DB community to solve? In this paper, we present our vision of how ML and database systems are likely to come together, and early steps we take towards making this vision a reality.
Carlo Curino is the lead of Gray Systems Lab (GSL). Before this Carlo was a Principal Scientist in Cloud and Information Services Lab (CISL), working on large-scale distributed systems, with a focus on scheduling for BigData clusters; this line of research was co-developed with several team members and open-sourced as part of Apache Hadoop/YARN. Intrinsically, this research work enables us to operate the largest YARN clusters in the world (deployed on 250k + servers within Microsoft).
Prior to joining Microsoft was a Research Scientist at Yahoo!; primarily working entity deduplication and scale and mobile+cloud platforms. Carlo spent two years as a Post Doc Associate at CSAIL MIT working with Prof. Samuel Madden and Prof. Hari Balakrishnan. At MIT he also served as the primary lecturer for the course on databases CS630, taught in collaboration with Mike Stonebraker.
Carlo received a Bachelor in Computer Science at Politecnico di Milano. He participated in a joint project between University of Illinois at Chicago (UIC) and Politecnico di Milano, obtaining a Master Degree in Computer Science at UIC and the Laurea Specialistica (cum laude) in Politecnico di Milano. During the PhD at Politecnico di Milano, Carlo spent two years as a visiting researcher at UCLA, working with Prof. Carlo Zaniolo (UCLA) and Prof Alin Deutsch (UCSD).
Research interests: ML-for-Systems and Systems-for-ML, large scale distributed systems, performance tuning, and scheduling.
Previous research work: mobile+cloud platforms, entity dedup at scale, relational databases and cloud computing, workload management and performance analysis, schema evolution, and temporal databases.
SEMINAR HOST: Greg Ganger
VISITOR COORDINATOR: Karen Lindenfelser
SDI SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/