PDL CONSORTIUM SPEAKER SERIES A ONE-AFTERNOON SERIES OF SPECIAL SDI TALKS BY DATE: Tuesday, May 7, 2019 SPEAKERS:
High Performance Analytics Toolkit (HPAT) is a compiler-based big data framework that compiles Python analytics codes to optimized binaries with MPI automatically, providing Python productivity and HPC efficiency simultaneously. HPAT uses several domain-specific compiler techniques to achieve this goal, including a new auto-parallelization compiler algorithm that detects the underlying map/reduce parallel pattern. Furthermore, HPAT performs high level optimizations (e.g. fusion of operators) by treating analytics APIs (Pandas/Numpy) as a domain-specific language (DSL), but without imposing DSL limitations to the programmer. Performance evaluation of HPAT shows up to 2000x speedup over Spark for common benchmarks. In addition, several real applications demonstrate the benefits of the HPAT approach, including deployment from cloud to edge without any code rewrite. HPAT is under development as a software product to enable the next generation of data-centric applications. BIO: Ehsan Totoni is a Research Scientist at Intel Labs, working on programming systems for large-scale big data analytics that provide high programmer productivity as well as high performance on modern hardware. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2014. During his Ph.D. studies, he was a member of the Charm++/AMPI team working on performance and energy efficiency of HPC systems.
BIO: Luis Remis is a member of the Systems and Software Research Group at Intel Labs, where his current research involves Cloud Systems for Visual Data. He has been working on the Visual Data Management System project since it inception. He holds an M.S. in Computer Science from the University of Illinois at Urbana-Champaign (UIUC), where he was a Research Assistant working on graph processing using heterogeneous platforms. His industry experience includes being part of the Modeling team at the Aerospace Division at INVAP from 2012 to 2014, where he worked on R&D for radar signal processing using graphics accelerators, and being part of the Autopilot team at Tesla Motors in 2015.
CacheLib is an embedded caching engine, which addresses this requirement with a unified API for building a cache implementation across many HW mediums. CacheLib transparently combines volatile and non-volatile storage in a single caching abstraction. To meet the varied demands, CacheLib successfully provides a flexible, high-performance solution for many different services at Facebook. In this talk, we describe CacheLib's design, challenges, and several lessons learned. BIO: I am a software engineer in the Cache Infrastructure at Facebook since 2012. Cache Infrastructure develops and operates services to provide efficient, online access to social graph data. It encompasses services like TAO, Memcache and libraries like CacheLib that enable building cache services at Facebook. I graduated with a masters in Computer Science from University of Wisconsin Madison.
BIO: Jim Cipar is a software engineer on Facebook's MLX team, focusing on applying machine learning to networking and infrastructure challenges. In addition to caching, the MLX team works on live database queries, service stress testing, load balancing, and build/test automation. He earned his PhD from Carnegie Mellon, and is a PDL alum.
BIO: Aurosish Mishra is a Software Development Manager in the Oracle Database Engine group. His team is responsible for building the storage engine for the next-gen, cloud-scale Oracle Autonomous Database leveraging innovative technologies such as NVM storage, RDMA access and SIMD processing. He also leads the development of key features for Oracle's flagship Database In-Memory Engine that provides real-time analytics at the speed of thought. Aurosish holds a Master's degree in Computer Science from Cornell University, and a Master's/Bachelor's degree in Computer Science from IIT Kharagpur.
BIO: Pat Helland has been working in distributed systems, databases, transaction processing, scalable systems, and fault tolerance since 1978. For most of the 1980s, Pat worked at Tandem Computers as the Chief Architect for TMF (Transaction Monitoring Facility), the transaction and recovery engine under NonStop SQL. After 3+ years designing a Cache Coherent Non-Uniform Memory Multiprocessor for HaL Computers (a subsidiary of Fujitsu), Pat moved to the Seattle area to work at Microsoft in 1994. There he was the architect for Microsoft Transaction Server, Distributed Transaction Coordinator, and a high performance messaging system called SQL Service Broker, which ships with SQL Server. Then, he was at Amazon working on the product catalog and other distributed systems projects including contributing to the original design for Dynamo. After returning to Microsoft in 2007, Pat worked on a number of projects including Cosmos, the scalable Big Data plumbing behind Bing. While working on Cosmos, Pat architected both a project to integrate database techniques into the massively parallel computations as well as a very high-throughput event processing engine. Since early 2012, Pat has worked at Salesforce.com in San Francisco. He is focusing on multi-tenanted database systems, scalable reliable infrastructure for storage, and software defined networking. SDI / ISTC SEMINAR QUESTIONS? |