PDL CONSORTIUM SPEAKER SERIES

A SERIES OF SPECIAL SDI TALKS BY
PDL CONSORTIUM VISITORS

DATE: Wednesday, May 4, 2022
TIME: see times below
PLACE: NSH 3305

SPEAKERS: Wednesday May 4, 2022

12:00 pm -12:45 pm Cross Cluster Replication in OpenSearch
Kiran Reddy, Principal Engineer, Amazon Web Services
12:45 pm - 1:30 pm Building a Next-generation Serverless Platform for Converged AI/ML and HPC Workflows
Carlos H. A. Costa, Principal Research Staff Member, IBM
1:30 pm - 1:45 pm BREAK
1:45 pm - 2:30 pm Reinventing Amazon Redshift
Sanket Hase, Software Development Manager, Amazon Web Services
2:30 pm - 3:15 pm Oracle Large-Object Storage: Scaling for OLTP
Fan Wu, Software Development Manager, Oracle
3:15 pm - 3:30 pm BREAK
3:30 pm - 4:15 pm Data-Centric Computing with Emerging Memory and Storage Devices
Rekha Pitchumani, Sr. Research Manager, Samsung
4:15 pm - 5:00 pm
Real World Challenges in the Oracle Database Cloud
Vikramraj Sitpal, Senior Member of Technical Staff, Oracle


Cross Cluster Replication in OpenSearch
Kiran Reddy, Principal Engineer, Amazon Web Services
Abstract: The AWS OpenSearch services team shipped cross cluster replication in the fall of 2021. The feature serves as a foundation block for customer capabilities like high availability, disaster recovery, and data proximity. In this talk, we will go over the design choices and tradeoffs we made while building cross cluster replication. We will also go over some of the future work planned for the feature as well as overall OpenSearch.
Bio: Kiran is a Principal Engineer at Amazon Web Services. His work at Amazon has been focused on distributed systems. In the past, he has worked on services like S3, DynamoDB, and EBS. He is currently with the open search team in AWS. In a previous life, he was an Academic and got his PhD from Harvard University.


Building a Next-generation Serverless Platform for Converged AI/ML and HPC Workflows
Carlos H. A. Costa, Principal Research Staff Member, IBM
Abstract: In this talk, I will discuss new challenges from emerging workflows where AI and data analytics workloads are coupled to enable faster and more accurate results or radically new solutions. I will share an overview of our efforts in pushing the boundaries of distributed computing with the evolution of serverless platforms to enable and transform these new use cases. I will review emerging open source technologies and discuss the development of CodeFlare, IBM Research's new open source framework for AI/ML workflows.
Bio:Dr. Costa is a Principal Research Staff Member at IBM T. J. Watson Research Center, where he leads efforts to build next-generation serverless platform for AI/ML and HPC workflows. His research is mainly focused on system software, programming models and middleware for next-generation distributed systems, working at the intersection of traditional HPC and emerging distributed computing paradigms. He has been involved in multiple projects in the areas of HPC and analytics, including the BlueGene/Q system, the Active Memory Cube (AMC) architecture for in-memory processing, and DoE ORNL’s Summit and LLNL’s Sierra supercomputer systems, among other projects with clients and academic partners.


Reinventing Amazon Redshift
Sanket Hase, Software Development Manager, Amazon Web Services
Abstract: In 2013, eight years ago, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift, the first fully managed, petabyte-scale cloud data warehouse solution. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools. This launch was a significant leap from the traditional on-premise data warehousing solutions which were expensive, rigid (not elastic), and needed a lot of tribal knowledge to perform. Unsurprisingly, customers embraced Amazon Redshift and it went on to become the fastest growing service in AWS. Today, tens of thousands of customers use Amazon Redshift in AWS’s global infrastructure of 25 launched Regions and 81 Availability Zones (AZs) to process Exabytes of data daily. The success of Amazon Redshift inspired a lot of innovation in the analytics industry which in turn has benefited consumers. In the last few years, the use cases for Amazon Redshift have evolved and in response, Amazon Redshift has delivered a series of innovations that continue to delight customers. In this talk, we take a peek under the hood of Amazon Redshift, and give an overview of its architecture. We focus on the core of the system and explain how Amazon Redshift maintains its differentiating industry-leading performance and scalability.
Bio: Sanket Hase is a software development manager at AWS Redshift. Redshift is Amazon's fully managed, petabyte-scale data warehouse service. Previously, Sanket has worked in the areas of real-time analytics in distributed databases, data reduction (compression, deduplication) in distributed storage, and highly available storage systems across various companies. Sanket received his Masters from the INI department at Carnegie Mellon University, and is a proud PDL alumnus.


Oracle Large-Object Storage: Scaling for OLTP
Fan Wu, Software Development Manager, Oracle
Abstract: Oracle SecureFiles LOB data format was introduced in 2009 (version 11g). It enables the storage of unstructured files in relational tables and supports full transaction operations. The underlying storage system was designed to optimize for write-once-read-many use cases, with the single file size ranging from hundreds of KBs to several GBs. Since the initial implementation, the application demands evolved and new challenges emerged. Small files like JSON and XML documents have become popular. The applications are now running OLTP workloads on LOB data format with latency constraints. In this talk, I will describe problem-solving process we underwent at Oracle to make the LOB data format applicable to a latency-sensitive OLTP workload.
Bio: Fan Wu is a Software Development Manager at Data-Space-Transaction (DST) group at Oracle. She joined Oracle in 2017, and since then she has been working on the storage layer for the large-object format for RDBMS. She engaged in various efforts including improving the scalability for OLTP workloads for one of Oracle's cloud services and leveraging the persistent memory to provide efficient storage in distributed systems. Fan Wu received her Master's degree in the field of electrical engineering from the University of Wisconsin-Madison.


Data-Centric Computing with Emerging Memory and Storage Devices
Rekha Pitchumani, Sr. Research Manager, Samsung
Absttract: Modern data-centric applications are triggering an architectural change, leading to data-centric computing. Emerging cache-coherent interconnects such as CXL are enabling memory expansion beyond the DIMM slots, and connecting both memory and storage devices to the same physical slots. Compute offload to and acceleration via purpose-build compute engines are also in high demand and their programming model is also expected to undergo a change with CXL adoption. In this talk, I will discuss the changes happening and what Samsung is doing to make this industry-wide data-centric computing vision a reality.
Bio: Dr. Rekha Pitchumani is a Sr. Research Manager in the Systems Technology Group, Memory Solutions Lab (https://samsungmsl.com/), Samsung, in San Jose, California. Her work at Samsung focuses on the systems design and device architecture of next-generation memory devices and NAND flash SSDs, such as CXL memory/storage devices, Key Value SSDs and Computational Storage Devices. Her prior work was on key-value data management for Shingled Magnetic Recording disks and she holds a doctoral degree of Computer Science from the University of California, Santa Cruz.


Real World Challenges in the Oracle Database Cloud
Vikramraj Sitpal, Senior Member of Technical Staff, Oracle
Abstract: Oracle's autonomous DB cloud offering has proposed many new interesting technical challenges to all the teams within Oracle including VOS, as we are at the lower layers of the Database software stack. In this talk, I will describe the design and research challenges a development team like Virtual OS (VOS) faces in the Cloud environment. The challenges in this talk span fields like system security, efficient resource management, performance at scale, debugging (at enterprise scale) and more.
A little about VOS: The Virtual Operating System (VOS) development group provides a portable and high-performance platform for the Oracle Database. It is the platform upon which the database and the database cloud is built. We define abstractions and develop infrastructure modules that provide process/thread/user-threads management and scheduling, memory management, heap management, synchronization support, CPU/GPU and I/O resource allocation, placement and management, cloud interfacing and management, bin packing, machine learning, inter-cluster and inter-process communication, high-performance file and storage I/O, event model, statistics, data representation, compression, encryption, persistent memory programming model and more for autonomous cloud, hybrid cloud, cloud at customer, and on-premise environments.
Bio: I am a Senior MTS at Oracle in the Database Virtual OS (VOS) group. I received my MS from CMU in 2021 in the field of Computer Systems. I currently work on CPU Resource Management, I/O (direct NFS) and Memory management.

^TOP


SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/