DATE: Thursday, April 17, 2014
    TIME: Noon - 1:00 pm
    PLACE: TBA
SPEAKER: Lee Sheng, WibiData
TITLE: Exploring Enron Email Dataset with Kiji and Hive
ABSTRACT:
    Apache   Hive is a data warehousing system for large volumes of data stored in   Hadoop that provides SQL based access for exploring datasets. KijiSchema   provides evolvable schemas of primitive and compound types on top of   HBase.  The integration between these provides the best aspects of both   worlds (ad hoc SQL based querying on top of datasets using evolvable   schemas containing complex objects).  This talk will present an examples   of queries utilizing this integration to do exploratory analysis of the   Enron email corpus.  Delving into topics such as email responder pairs   and sentiment analysis can expose many of the interesting points in the   rise and fall of Enron.
BIO: 
    Lee   is an engineer at WibiData who works on building tools for building Big   Data Applications.  He holds a BS in Computer Science from Carnegie   Mellon University. Previous stints include developing systems for making   strategic buying decisions at Amazon.com as well as distributed   simulation frameworks for the Department of Defense.
VISITOR HOST: Andy Pavlo
    VISITOR COORDINATOR: 
    Jenn Landefeld jennsbl@cs.cmu.edu 
SDI / ISTC SEMINAR QUESTIONS?
    Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/ 
*partially funded by 
