PDL CONSORTIUM SPEAKER SERIES

A ONE-AFTERNOON SERIES OF SPECIAL SDI TALKS BY
PDL CONSORTIUM VISITORS

DATE: Tuesday, May 8, 2018
TIME: 12:00 pm to 5:15 pm
PLACE: RMCIC Panther Hollow Room - RMCIC 4th Floor


SPEAKERS:

12:00 - 12:40 pm Shao-Wen Yang, Intel
12:40 - 1:20 pm Zahra Khatami, Oracle
1:20 - 1:40 pm break
1:40 - 2:20 pm Gosia Steinder, IBM Research
2:20 - 3:00 pm Xiaoyong Liu, Alibaba
3:00 - 3:15 pm break
3:15 - 3:55 pm Larry Rudolph, Two Sigma
3:55 - 4:35 pm Weiwei Gong, Oracle
4:35 - 5:15 pm Rob Johnson, VMware

All talks located in RMCIC Panther Hollow Conference Room, 4th Floor.


SPEAKER: Shao-Wen Yang, Intel
Orchestration in Visual Fog Computing
The visual fog paradigm envisions tens of thousands of heterogeneous, camera-enabled edge devices distributed across the Internet, providing live sensing for a myriad of different visual processing applications. The scale, computational demands, and bandwidth needed for visual computing pipelines necessitates offloading intelligently to distributed computing infrastructure, including the cloud, Internet gateway devices, and the edge devices themselves. We focus the presentation in the two aspects of visual fog orchestration: offloading and scheduling. Offloading is a mechanism for realizing (live) workload migration, whereas scheduling is the problem of assigning the visual computing tasks to various devices to optimize network utilization. In our pioneered study, we demonstrate sub-minute computation time to optimally schedule 20,000 tasks across over 7,000 devices, and just 7-minute execution time to place 60,000 tasks across 20,000 devices. By showing our approach is ready to meet the scale challenges, visual fog is feasible and a viable paradigm to scale out video analytics systems.

BIO: Shao-Wen Yang is a senior staff research scientist at Intel Labs, Intel Corporation. He received his PhD degree in computer science from National Taiwan University in 2011. He joined Intel Labs in Taiwan in 2013 as a resident scientist in the Intel-NTU Connected Context Computing Center and focused on research and development of Internet of Things technology and middleware. In 2016 he moved to California, and started to focus on the Internet of Video Things. He served on the technical program committee of the Design Automation Conference and has served as a guest editor for several international journals. His current research interests include various aspects of visual fog computing, especially e-of-use frameworks for workload creation, and partitioning and orchestration of visual fog workloads. He has over 20 technical publications, and over 60 patents and pending patent applications.


SPEAKER: Zahra Khatami, Oracle
Supporting SPDK in Oracle Database
SPDK has been successful in enabling a large class of high performance user mode storage applications and appliance. SPDK provides direct access to local NVMe SSDs as well as access to remote storage targets using NVMeoF. SPDK provides a highly concurrent and asynchronous runtime with no locking in the I/O path. High throughput and low latency is realized by directly polling the hardware queues for completions. DPDK toolkit is used for memory management and lock free message passing between compute threads for efficient scale out designs.

In this talk we will go over the challenges faced in incorporating SPDK within a complex enterprise class application: Oracle RDBMS. There is a significant impedance mismatch in deployment models of classical SPDK enabled applications and Oracle process and memory model. Oracle database consists of a large number of processes (over 10K+ processes) per node and a large System Global Area (SGA) that is symmetrically mapped into each process. Oracle RDBMS implements a comprehensive memory management infrastructure spanning SGA, process private memory (PGA) as well as a Managed Global Area (MGA) optimized for efficient data transfer over high performance networks such as Infiniband and RoCE.

Most NVMe SSDs contain a limited number of hardware IO queues. To enable high performance IO dispatch and completion from a large number of processes to local NVMe drives Oracle has implemented a dispatcher model. The IO dispatcher provides light weight dispatch of IOs using shared memory lock free queues and polling for completions. Oracle dispatcher implements various scheduling policies to optimize for overall throughput and latency dependent on workload as well as QoS aware scheduling of IOs.

For seamless integration with the memory model and RDMA data transfer optimizations available in Oracle we have worked with the SPDK community to decouple the storage libraries from the underlying DPDK runtime that provides the memory management, threading and messaging primitives. Oracle has implemented a SPDK environment library (ORAENV) that provides similar features to DPDK RTE environment using Oracle runtime services. The ORAENV library provides dynamic allocation of shared memory from SGA, PGA and MGA pools that is optimized for local storage IO as well as remote network IO with NVMeoF.

High performance NVMeoF communications using ORAENV is accomplished using optimizations such as Shared Protection Domain which allows a single memory mapping to be registered with the RDMA adapter for use by all Oracle processes. This significantly reduces the Memory Translation Table (MTT) cache thrashing on the RDMA adapter which has been shown to be a bottleneck in IO throughput with increased cache misses.

BIO: Zahra Khatami is working as a Member of Technical Staff at Virtual Operating system (VOS) group at Oracle. She has received her PHD and Master degrees from Louisiana State University in the field of computer science. She is currently working on developing a framework for supporting SPDK in Oracle Database.


SPEAKER: Gosia Steinder, IBM Research
Overlooked Problems in Cloud Application Placement

Application placement remains a hot problem in clouds and data centers. Our team has developed a new technique for computing the mapping of applications, containers or VMs to underlay infrastructure. In this talk I will discuss this approach can help solve some often overlooked challenges in resource management: topology-awareness for GPU workloads, placement policy adaptation, resource overbooking, workload preemption, and resiliency to unknown bottlenecks.

BIO: Gosia Steinder is (an) IBM Fellow in IBM Research, Yorktown Heights, NY. In the last years she has been working on resource management and orchestration in clouds, container management, and container security.


SPEAKER: Xiaoyong Liu, Alibaba
Maximizing Machine Learning Performance with Heterogeneous Computing Resources
Main stream of research focuses on Machine learning accuracy, however the research on machine learning efficiency is equally critical. To achieve efficiency, machine learning in modern computing platforms require programs to be parallel and run on heterogenous hardware. This presentation will explain the importance of improving the performance of machine learning and the way Alibaba approaches with introduction on why Alibaba’s use-case is special, as well as our challenges, strategies and interests.

BIO: Xiaoyong Liu is currently a Senior Staff Engineer and Chief Architect of heterogenous computing for machine learning at the Alibaba Infrastructure Service Group. Prior to joining Alibaba, Xiaoyong was a senior staff engineer at Xilinx, leading SDAccel development environment for enabling FPGA's performance/watt advantage in data-center application acceleration. Before that, he worked for Synopsys, leading the performance effort for its VCS-MX HDL Simulator. Xiaoyong Liu received B.S and M.Sc in Computer Science from University of Science and Technology of China.


SPEAKER: Larry Rudolph, Two Sigma
Security In the Cloud
Trust is difficult for the security conscious. To minimize trust of a cloud provider, bare-metal provisioning maybe part ol the answer. Another part maybe avoiding storing sensitive data in cloud but using in-ram caching to avoid expensive out-of-cloud communication. The talk will cover two projects. Bolted, at approach being pursued at the Mass Open Cloud to provide a higher level of security via attestation to the tenant, and Fridge, a scalable data object in-memory cache, particularly suitable for large scale use of the spot-market.

BIO: Larry Rudolph is a Senior Research Scientist at Two Sigma, an Affiliate MIT/CSAIL, and academic advisor to the Massachusetts Open Cloud. Larry joined the "Labs" team at Two Sigma where he continues his academic engagements, publishing and co-advising students at various institutions. He has working and published in the areas of parallel processing from theory to practice. He has been on the faculty of CMU, The Hebrew University, and MIT.


SPEAKER: Weiwei Gong, Oracle
Oracle Database In-Memory: Accelerating Joins and Aggregations
Oracle Database In-Memory dual format was first introduced in 12c in 2013, it optimizes both analytics and mixed workload OLTP, delivering outstanding performance for transactions while simultaneously supporting real-time analytics, business intelligence, and reports. In this talk, I will go over different features in Oracle Database In-Memory, and describe how we accelerate joins and aggregations on In-Memory Database.

BIO: Weiwei Gong leads the Vector Flow Analytic team in the Data and In-Memory Technologies group, which is responsible for designing and developing the data storage and processing engine for the Oracle database. She joined Oracle in 2014, since then she has been working on leveraging new hardware technologies to accelerate SQL processing. She obtained Ph.D. in Computer Science from University of Massachusetts Boston and holds M.Sc. from Renmin University of China.


SPEAKER: Rob Johnson, VMware
Space-Efficient Maps for Small Key-Value Pairs and their Applications
This talk will describe our recent work extending the quotient filter to be a general-purpose space-efficient map for small keys and values. The quotient filter was originally designed as a replacement for the Bloom filter, but we have found that many Bloom-filter applications really just need a space-efficient map data structure. As a result, it is often possible to build faster, smaller, and simpler applications by using a space-efficient map than by combining a Bloom filter with other off-the-shelf data structures. The talk will describe several example applications in computational biology.

The talk will conclude by discussing potential future work in space-efficient maps and other potential applications of these data structures.

BIO: Rob Johnson is a Senior Researcher at VMware and Research Assistant Professor at Stony Brook University. He developed BetrFS, invented the quotient filter, developed the Squeakr and Mantis tools for large-scale computational biology, created the foundational theory of cache-adaptive analysis, broke the high-bandwidth Digital Content Protection (HDCP) crypto-system, and co-authored CQual, a static analysis tool that has found dozens of bugs in the Linux kernel.

^TOP


SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/