DATE: Thursday, April 14, 2016
TIME: 12:00 pm - 1:00 pm
PLACE: RMCIC 4th Floor Panther Hollow Room

SPEAKER: Yi Pan, LinkedIn

TITLE: Building a Lambda-less Stream Processing System using Local States and Windowing

ABSTRACT:
This talk will provide an overview of LinkedIn's distributed stream processing platform, including Samza/Kafka/Databus. It will first cover the high level scenarios for stream processing in LinkedIn, followed by detailed requirements around scalability, re-processing, accuracy of results, and ease of programmability; then we will focus on the requirements of stateful stream processing applications and explain how Samza's state management allows us to build applications that meet all the above requirements. The key concepts, architecture and usage in LinkedIn's stream processing pipeline will be explained, including state management in Samza, the use and configuration of Kafka and Databus as input/output and as a change log. We will also discuss in detail how we leverage the reliable, replayable messaging system (i.e. Kafka) together with durable state management in Samza to build a Lambda-less stream processing platform. The key mechanism to achieve a unified process model between batch and real-time stream is windowing. We will dive into the requirements and our solutions to windowing a real-time stream in this talk as well.

BIO:
Yi Pan graduated from UCI with a Ph.D. in Computer Science in 2008. Since then, he has worked in distributed platforms for Internet applications for 8 years. He started at Yahoo! working on Yahoo!'s NoSQL database project, leading the development of multiple features, such as real-time notification of database updates, secondary index, and live-migration from legacy systems to NoSQL databases. Later, he joined and led the development of the Cloud Messaging System, which is used heavily as a pub-sub service and transaction log for distributed databases at Yahoo!. Since 2014, he joined LinkedIn and quickly became the lead of the Apache Samza team at LinkedIn, which provides a scalable stream processing service for the whole company.

VISITOR HOSTS: Majd Sakr, Garth Gibson

VISITOR COORDINATOR: Majd Sakr, msakr@cs.cmu.edu, 412-268-1161

SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/

*partially funded by