Parallel Data Laboratory

PAST PDL PROJECTS

Our research addresses a broad spectrum of storage-related challenges, including storage security, emerging technologies, disk characterization and modeling, efficient storage access, storage networking, and network-attached storage clusters.

Abacus - Dynamic Function Placement for Data-Intensive Cluster Computing
ABLE - Attribute Based Learning Environments
Active Disks - Remote Execution for Network-Attached Storage
Active Storage Networks - enable flexible construction of sophisticated storage and file-system functionality that can migrate to the most appropriate location in the system
Argon (Storage QoS) - performance insulation for shared storage servers
Astro-DISC - new algorithms, data structures, and software tools for the analysis of massive astronomical and cosmological datasets.
ATLAS - Atlas analyzes and models the operation of Los Alamos National Lab supercomputer clusters, and use these models to develop techniques that improve clusters’ operational efficiency
Attribute-Based Naming - techniques to gather attributes based on context analysis
Batchactive Scheduling - Cluster Scheduling for Explicitly-speculative Tasks
Cloud Scheduling (TetriSched) - maximize resource efficiency and utilization via a scheduler that accepts resource requests in the form of utility functions
Database I/O - optimizing database performance
Data-Intensive Supercomputing (DISC) - research to extend the type of computing systems used for Internet search to a larger range of applications
dbug - exploring an alternative method to stress testing called systematic testing, which controls the order in which certain concurrent events occur
DiskReduce - a framework for integrating RAID into replicated storage systems to lower storage capacity overhead
DiskSim - an efficient, accurate, highly-configurable disk system simulator.
DIXtrac - a program for disk extraction used to characterize over 100 performance-critical parameters
Elastic Storage (SpringFS)
eScience - PDL projects that are data-intensive and thus heavily invested in the use of computers for advancement
Expressive Storage Interfaces - increase the cooperation between device firmware and OS software to significantly increase the end-to-end performance and system robustness
Failure Data Analysis - to better understand what makes system unreliable, i.e. what do failures in today's large-scale production systems look like
Fates Database Storage - the Fates architecture offers efficient execution at all levels of memory hierarchy and optimizes data layout to improve performance, by exploiting the unique characteristics available at each level
FAWN - fast arrays of wimpy nodes
Fingerpointing - problem diagnosis in distributed systems
Freeblock Scheduling - a new approach to utilizing more of disks' potential media bandwidths
File System Virtual Appliances (FSVA) - a new approach for third-party FSs, leveraging virtual machines to decouple the OS version in which the FS runs from the OS version used by the user’s applications
Hadoop Workload Analysis - to better understand data scientists' use of the Hadoop system through workload analysis
Home Storage - data management for the home
Incast - addressing catastrophic TCP throughput collapse in storage server networks
IndexFS - scaling file system metadata for performance
Informed Prefetching - applications should issue hints which disclose their future I/O accesses
Landslide - Systematic dynamic race detection in kernel space
MEMS-Based Storage - a new technology that could provide significant performance gains over current disk drive technology and at lower cost
NASD - Network Attached Secure Disks: all storage systems that exhibit the following properties: direct client-drive data transfer in a networked environment, asynchronous oversight by the high level filesystem, cryptographic support for the integrity of requests, storage self management opportunities derived from a more abstract and independent role for storage systems, the ability to extend the feature set of a NASD for the purpose of applications, as well as for the client operating system
Non-Volatile Memory Techonologies - examining the use of NVM technologies as part of main memory, accessed directly using load/store instructions in order to overcome the challenges associated with building a DRAM-only main memory
N-Store - studying NVMs to understand their performance characteristics in the context of big data systems and build the groundwork for new DBMS architectures.
Otus - improving resource attribution through a monitoring system implementation
PASIS Survivable Storage - decentralized storage systems whose availability and security policies can survive component failures and successful malicious attacks
PDL vCloud - replacing a multitude of single-purpose clusters, managed and underutilized by individual groups, with an IaaS private cloud for class projects, simulations, data analyses, and cluster and data-intensive computing activities
Peloton - a relational database management system designed for fully autonomous optimization of hybrid workloads.
Petascale Data Storage Institute (PDSI) - addressing the challenges of petascale computing for scientific discovery on information storage capacity, performance, concurrency, reliability, availability, and manageability
PLFS - Parallel Log-Structured File System to act as an interposed layer inserted into the existing storage stack able to rearrange problematic access patterns to achieve much better performance from the underlying parallel file system
PRObE - Parallel Reconfigurable Observational Environment -- a one-of-a-kind computer facility dedicated to large-scale systems research, which allows hands-on operation of very large compute resources
Problem Analysis - analyzing performance and reliability problems in deployed large-scale systems
pWalrus - a storage service layer that integrates parallel file systems effectively into cloud storage
RAID - Redundant Arrays of Independent Disks
RAIDframe - A Rapid Prototyping Tool for RAID Systems
Self-Securing Devices - systems with security functionality equally distributed among physically distinct system components
Self-Securing Storage - storage devices that prevent successful intruders from undetectably tampering with or permanently deleting stored data
Self-* Storage - a new storage architecture that integrates automated management functions and simplifies the human administrative task. Self*-systems are self configuring, self-organizing, self-managing, etc.
ShardFS - Replicated Directories with Sharded Files
SIO - focuses on improving I/O performance for massively parallel and large multiprocessor systems to include other forms of parallel computing such as networks of workstations
Storage QoS (PriorityMeister) - Providing storage QoS in dynamic heterogeneous networks and storage environments in the face of workload interference
Tetriscope - a combination of the application scheduler TetriSched and the visualization tool Atlas.
//TRACE - an approach for extracting and replaying traces of parallel application to automatically discover inter-node data dependencies and inter-request compute times for each node (process) in an application
Video - NASD video server: storage and retrieval of digital video
Workload Characterization - Data mining meets traffic modeling
YCSB++ - Yahoo! Cloud Serving Benchmark (YCSB) with a set of extensions to improve performance understanding and debugging
Other - e.g., Transparent Informed Prefetching and Caching (TIP), Scotch Parallel File Systems

PARALLEL DATA LAB

CONTACTS

PAST PDL PROJECTS

Contact us

Recent Events

PDL Retreat 2026

PDL Retreat 2024

PDL Retreat 2023

Social Media