PDL NEWS

2019

January 2019
Lorrie Faith Cranor Named New Director of Carnegie Mellon University's CyLab

Lorrie Faith Cranor has been named the next director of CyLab, Carnegie Mellon University's security and privacy institute, effective January 15. As Director, Cranor will be assuming the chair as the Bosch Distinguished Professor in Security and Privacy Technologies. CyLab, founded in 2003, brings together security and privacy experts from all schools across Carnegie Mellon with the vision of creating a world in which technology can be trusted.

"I'm honored and thrilled to serve as CyLab's next director," Cranor said. "I look forward to supporting CyLab's ongoing success and bolstering research aimed at making our increasingly digital world safe and trustworthy."

Cranor is the FORE Systems Professor of Computer Science and of Engineering and Public Policy, and directs the CyLab Usable Privacy and Security (CUPS) Laboratory. She is a co-director of Carnegie Mellon's Privacy Engineering master's program, and served as Chief Technologist at the Federal Trade Commission (FTC) in 2016.

An internal committee conducted a rigorous international search for director candidates. Cranor was selected for her leadership in the field and for her vision of the next phase of CyLab's growth.

"Lorrie's extensive leadership experience and background, as well as her recent government experience as the FTC's Chief Technologist, make her an exceptional choice as CyLab's new director," said Jon Cagan, interim dean of Carnegie Mellon's College of Engineering.

Having played a key role in building the usable privacy and security research community, Cranor co-edited the seminal book Security and Usability and founded the Symposium On Usable Privacy and Security (SOUPS). She is a co-founder of Wombat Security Technologies, Inc, a security awareness training company.

Cranor has authored over 150 research papers on online privacy, usable privacy and security, and other topics. Her current research projects include password usability and security, privacy for the Internet of Things, and development of meaningful and usable privacy notices and consent experiences.

Before joining the Carnegie Mellon faculty, Cranor received her doctorate degree from Washington University in St. Louis and was a member of the secure systems research group at AT&T Labs-Research. She is a Fellow of both the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE), and she is a member of the ACM CHI Academy.

Cranor's appointment follows that of Douglas Sicker, Head of Carnegie Mellon's Engineering and Public Policy department, who has served as CyLab's Interim Director since September 1, 2017. Sicker stepped in after the previous director, Electrical and Computer Engineering professor David Brumley, took a leave of absence to help grow his startup company, ForAllSecure.
-- Daniel Tkacik, CyLab Security and Privacy Institute News, Jan 14, 2019

January 2019
Joshi Optimizes Computing Systems for IBM's Watson

Machine learning has grown dramatically in engineering and computer science in recent years with the explosion of interest in artificial intelligence. In machine learning, humans — engineers and computer scientists — feed large data sets into a neural network model to train the model to learn from data and eventually identify and analyze patterns and make decisions.

Carnegie Mellon University's Gauri Joshi is researching the analysis and optimization of computing systems. Joshi, assistant professor of Electrical and Computer Engineering (ECE), was has been named a recipient of a 2018 IBM Faculty Award for her research in distributed machine learning. Faculty Award recipients are nominated by IBM employees in recognition of a specific project that is of significant interest to the company and receive a cash award in support of the selected project.

Joshi's research is about distributing deep learning training algorithms. The datasets used to train neural network models are massive in size, so a single machine is not sufficient to handle the amount of data and the computing required to analyze the data. Therefore, datasets and computations are typically divided across multiple computing nodes (i.e. computers, machines, or servers), with each node responsible for one part of the data set.

In a distributed machine learning system with data sets divided across nodes, researchers use an algorithm called stochastic gradient descent (SGD), which is at the center of Joshi's research. The algorithm is distributed across the nodes and helps achieve the lowest possible error in the data. It requires exact synchronization, which can lead to delays.

"My work is about trying to strike the best balance between the error and the delay in distributed SGD algorithms," Joshi said. "In particular, this framework fits well with the IBM Watson machine learning platform; I will be working with the IBM Watson Machine Learning vision; I will be working with the IBM Research AI team."

In every iteration of the SGD, a central server is required to communicate with all of the nodes. If any of the nodes slow down, then the entire network slows down to wait for that node, which can significantly reduce the overall speed of the computation. Efficiency and speed of computation are the two main things Joshi aims to improve, both without risking the accuracy of the network.

"When you have a distributed system, communication and synchronization delays in the system always affect the proponents of the algorithm. I'm trying to design robust algorithms that work well on unreliable computing nodes," she said.

Prior to joining Carnegie Mellon's College of Engineering in fall 2017, Joshi was a research staff member at IBM's Thomas J. Watson Research Center. Because of her past experience, she was aware of the specific research projects that are relevant to the company's interests.

The funding provided by the Faculty Award will be used to support Joshi's students, who are working on the theoretical analysis for this project. In the future, she hopes to release an open source implementation of the new algorithm they have developed. Joshi plans to work with IBM to make this method available to anybody who wants to train their own machine learning algorithms using distributed SGD.
-- Marika Yang, Carnegie Mellon University News, January 9, 2019.

2018

December 2018
Mor Harchol-Balter made an IEEE Fellow

Mor Harchol-Balter has been elevated to fellow status in the Institute of Electrical and Electronics Engineers (IEEE), the world's largest technical professional organization. Fellow status is a distinction reserved for select members who have demonstrated extraordinary accomplishments in an IEEE field of interest. Mor, a professor in CSD since 1999, was cited "for contributions to performance analysis and design of computer systems." Her work on designing new resource-allocation policies includes load-balancing policies, power-management policies and scheduling policies for distributed systems. She is heavily involved in the SIGMETRICS/PERFORMANCE research community and is the author of a popular textbook, "Performance Analysis and Design of Computer Systems."
-- The Piper, CMU Community News, Dec. 12, 2018

December 2018
PDL Team Designing Record-breaking Supercomputing File System Framework at Los Alamos National Lab

Trinity occupies a footprint the size of an entire floor of most office buildings, but its silently toiling workers are not flesh and blood. Trinity is a supercomputer at Los Alamos National Laboratory in New Mexico, made up of row upon row of CPUs stacked from the white-tiled floor to the fluorescent ceiling.

The machine is responsible for helping to maintain the United States’ nuclear stockpile, but it is also a valuable tool for researchers from a broad range of fields. The supercomputer can run huge simulations, modeling some of the most complex phenomena known to science.

However continued advances in computing power have raised new issues for researchers.

“If you find a way to double the number of CPUs that you have,” says George Amvrosiadis “you still have a problem of building software that will scale to use them efficiently.” He’s an assistant research professor in Carnegie Mellon’s Parallel Data Lab.

Amvrosiadis was part of a team, including Professors Garth Gibson, and Greg Ganger, Systems Scientist Chuck Cranor, and Ph.D. student Qing Zheng. The team recently lent a hand to a cosmologist from Los Alamos struggling to simulate complex plasma phenomena. The problem wasn’t that Trinity lacked the power to run the simulations, but rather, that it was unable to create and store the massive amounts of data quickly and efficiently. That’s where Amvrosiadis and the DeltaFS team came in.

DeltaFS is a file system designed to alleviate the significant burden placed on supercomputers by data-intensive simulations like the cosmologist’s plasma simulation. When it comes to supercomputing, efficiency is the name of the game. If a task can’t be completed within the amount of time allotted, then the simulation will go incomplete, and precious time will have been wasted. With researchers vying for limited computing resources, any time wasted is a major loss.

DeltaFS was able to streamline the plasma simulation, bringing what had once been too resource-demanding a task within the supercomputer’s capabilities by tweaking a couple parts of how Trinity processed and moved the data.

First, DeltaFS changed the size and quantity of files the simulation program created. Rather than taking large snapshots encompassing every particle in the simulation—which numbered more than a trillion—at once, DeltaFS created a much smaller file for each individual particle. This made it much easier for the scientists to track the activity of individual particles.

Through DeltaFS, Trinity was able to create a record-breaking trillion files in just two minutes. Additionally, DeltaFS was able to take advantage of the roughly 10% of simulation time that is usually spent storing the data created, during which Trinity’s CPUs are sitting idle. The system tagged data as it flowed to storage and created searchable indices that eliminated hours of time that scientists would have had to spend combing through data manually. This allowed the scientists to retrieve the information they needed 1,000–5,000 times faster than prior methods.

The team could not have been more thrilled with the success of DeltaFS’ first real-world test run and are already looking ahead to the future. “We're looking to get it into production and have the cosmologist who originally contacted us use it in his latest experiment,” says Amvrosiadis. “To me that's more of a success story than anything else. Often a lot of the work ends with just publishing a paper and then you're done; that’s just anticlimactic.”

But he and the rest of the team aren’t just looking to limit their efforts to cosmological simulations. They’re currently looking at ways to expand DeltaFS for use with everything from earthquake simulations to crystallography. With countries across the globe striving to create machines that can compute at the exascale, meaning 1018 calculations per second, there’s a growing need to streamline these demanding processes wherever possible.

The trick to finding a one-size-fits-all (or at least most) replacement for the current purpose-built systems in use, is designing the file system to be flexible enough for scientists and researchers to tailor it to their own specific needs.

“What researchers end up doing is stitching a solution together that is customized to exactly what they need, which takes a lot of developer hours,” says Amvrosiadis. “As soon as something changes they have to sit back down to the drawing board and start from scratch and redesign all their code.”

Amvrosiadis and the team have already demonstrated a couple of ways that efficiency can be improved, such as indexing or altering file size and quantity. Now they’re looking into further ways to take advantage of potential inefficiencies, like using in-process analysis to eliminate unneeded data before it ever reaches storage or compressing information in preparation for transfer to other labs.

Solutions like these center around repurposing CPU downtime to perform tasks that will contribute back into the information pipeline and creating smarter ways to organize and store data, increasing overall efficiency. The idea is to let the expert scientists identify the areas where they have room for improvement or untapped resources, and to take advantage of the toolkit and versatile framework DeltaFS can provide.

As the world moves toward exascale computing, the pace that software development must maintain to keep pace with hardware improvements will only increase. Amvrosiadis even hopes that one day more advanced AI techniques could be incorporated to do much of the observational work performed by scientists, cutting down on observation time and freeing them to focus on analysis and study. But for him and the rest of the DeltaFS team, all of that starts with finding little solutions to improve huge processes.

“I don’t know if there’s one framework to rule them all yet—but that’s the goal.”
-- Dan Carroll, CMU Engineering News, December 1, 2018.

More recent PDL news here.

^TOP

 

 

© 2019. Legal Info.
Last updated 24 January, 2019