PARALLEL DATA LAB 

PDL People

Rashmi Vinayak


Contact:
www |
Office:
GHC 7007
Mailing Address: School of Computer Science
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213-3891
Position:
Projects:
Assistant Professor, CS and ECE
Big data systems


Research Interests:

My research interests lie in the broad area of computer and networked systems with a current focus on big data systems. I am interested in the fault tolerance, scalability, and performance challenges that arise in all layers of the big data stack -- storage/caching, networking, execution, and applications. Research Overview

A bulk of my past research has focussed on the storage/caching layer and in part on the application (specifically, machine learning) layer:

  • Storage/caching: My research focus here has been on fault tolerance, scalability, load balancing, and reducing latency in large-scale distributed data storage and caching systems. We designed coding theory based solutions that we showed are provably optimal. We also built systems and evaluated them on Facebook's data-analytics cluster and on Amazon EC2 showing significant benefits over the state-of-the-art. Our solutions are now a part of Apache Hadoop 3.0 and are also being considered by several companies such as NetApp and Cisco.
  • Machine learning: My research focus here has been on the generalization performance of a class of learning algorithms that are widely used for ranking. We designed an algorithm building on top of Multiple Additive Regression Trees, and through empirical evaluation on real-world datasets showed significant improvement over classification, regression, and ranking tasks. The new algorithm that we proposed is now deployed in production in Microsoft's data-analysis toolbox which powers the Azure Machine Learning product.