NOTE SPECIAL DAY - FRIDAY
NOTE SPECIAL TIME - 10:30 am

DATE: Friday, February 3, 2017
TIME: 10:30 am - 11:30 am
PLACE: RMCIC 4th Floor Panther Hollow Room

SPEAKER: Ehsan Totoni, Intel

TITLE: HPAT: High Performance Data Analytics with Scripting Ease-of-Use

ABSTRACT:
Big data analytics requires high programmer productivity and high performance simultaneously on large-scale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have prohibitive runtime overheads since they are library-based. We introduce an auto-parallelizing compiler approach that exploits the characteristics of the data analytics domain and is accurate, unlike previous auto-parallelization methods. We build High Performance Analytics Toolkit (HPAT), which parallelizes high-level scripting (Julia) programs automatically, generates efficient MPI/C++ code, and provides resiliency. Furthermore, HPAT provides automatic optimizations for scripting programs, such as fusion of array operations. Thus, HPAT is 369x to 2033x faster than Spark on the Cori supercomputer at LBL/NERSC and 20x-256x on Amazon AWS for machine learning benchmarks.

We also propose a compiler-based approach for integrating data frames into HPAT to build HiFrames. It automatically parallelizes and compiles relational operations along with other array computations in end-to-end data analytics programs, and generates efficient MPI/C++ code. HiFrames is 3.6x to 70x faster than Spark SQL for basic relational operations and can be several orders of magnitude faster for advanced operations.

BIO:
Ehsan Totoni is a Research Scientist at Intel Labs. He develops programming systems for large-scale HPC and big data analytics applications with a focus on productivity and performance. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2014. During his Ph.D. studies, he was a member of the Charm++/AMPI team working on performance and energy efficiency of HPC applications using adaptive runtime techniques.

SEMINAR HOST: Mike Kozuch

SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/