PDL CONSORTIUM SPEAKER SERIES
A ONE-AFTERNOON SERIES OF SPECIAL SDI TALKS BY
PDL CONSORTIUM VISITORS
DATE: Tuesday, May 9, 2017
TIME: 1:30 pm to 4:45 pm
PLACE: RMCIC Panther Hollow Room - RMCIC 4th Floor
All talks located in RMCIC Panther Hollow Conference Room, 4th Floor.
SPEAKER: Leif Walsh, Two Sigma
Flint: Locality-aware Time-series Analytics on Spark
We present Flint, a time-series analysis library built on Spark. Flint leverages the ordering property of time-series data to optimize joins and window aggregation operations in Spark, and provides a rich set of reshaping and analysis tools including time-aware joins and time-interval window operations. I will describe how Flint’s programming model fits in to Spark’s, how we extend it, what’s available today, and where we’re headed..
BIO: Leif is building Two Sigma’s distributed, time-series data analysis platform, leveraging pandas, Spark, Mesos, Parquet, Arrow, and some TS-originated open source tools: Flint and Cook.
SPEAKER: Daniel Feldman, Veritas Technologies
From Theory to Practice to Theory Again: Advanced Deduplication 2011-2017
In 2011, Symantec Research Labs (then part of the same corporate structure as Veritas) proposed an advanced distributed deduplication system for enterprise storage. It was published at USENIX ATC '11 and discussed in many other venues, including this event at PDL in 2013. I worked on the team that was responsible for productizing this deduplication system and introducing it to enterprise customers, including some of the largest enterprise data centers in the world, for more than 4 years. I'll talk about what went right and what went wrong in the process of productizing a research technology, several research projects that Veritas currently has in the pipeline, and how we intend to release them effectively and efficiently.
BIO: Daniel Feldman is a senior software engineer in the Advanced Development office at Veritas Technologies. He has published at ACM CCS and IEEE SecureComm, and worked on numerous Veritas products including NetBackup, SureScale, and Velocity. He is chair of the MinneAnalytics Big Data Tech conference (with over 1000 attendees), and sits on the boards of the IoT FUSE Foundation and the Analyze This data science training meetup group.
SPEAKERS: Roger MacNicol & Danica Porobic, Oracle
Storing Data in Rows and Columns Across the Information Lifecycle
Roger will talk about the tired old dichotomy between row-major and column-major is dead: Enterprise data needs both formats and often at the same time. Oracle has invested heavily in offering both formats at the same time across the Information Lifecycle from hottest In-Memory Store to coldest ZFS and HDFS stores. As interesting aspect of this story is the disruptive changes in hardware technology that are occurring and will shortly occur that impact the design decisions we are making.
Danica will take a few minutes to share her experience as a fresh PhD grad who joined Oracle Database team a few months ago.
BIOS: Roger is a Software Architect in Oracle Data Storage Technologies group and prior to that was the Architect for the query engine and optimizer of the worlds first commercially successful pure columnar database and chair of TPC-H. After a DPhil at Oxford, he has been working on various query engine implementations for the last 27 years.
Danica works on data management challenges posed by the modern hardware. She recently obtained a PhD from EPFL, working with professor Anastasia Ailamaki in the DIAS lab. Her thesis research focused on scaling up transaction processing systems on non-uniform hardware architectures.
SPEAKER: Nat Wyatt, Salesforce
The Salesforce Architecture
I will talk about how the Salesforce multi-tenant architecture and how it uses databases today, and some of the things we are looking for in the future.
BIO: Nat Wyatt is a Principal Architect working on advanced projects in the Salesforce database group.
SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/