PARALLEL DATA LAB

PAST PDL PROJECTS

Our research addresses a broad spectrum of storage-related challenges, including storage security, emerging technologies, disk characterization and modeling, efficient storage access, storage networking, and network-attached storage clusters.

  • Abacus - Dynamic Function Placement for Data-Intensive Cluster Computing
  • ABLE - Attribute Based Learning Environments
  • Active Disks - Remote Execution for Network-Attached Storage
  • Active Storage Networks - enable flexible construction of sophisticated storage and file-system functionality that can migrate to the most appropriate location in the system
  • Argon (Storage QoS) - performance insulation for shared storage servers
  • Astro-DISC - new algorithms, data structures, and software tools for the analysis of massive astronomical and cosmological datasets.
  • ATLAS - Atlas analyzes and models the operation of Los Alamos National Lab supercomputer clusters, and use these models to develop techniques that improve clusters’ operational efficiency
  • Attribute-Based Naming - techniques to gather attributes based on context analysis
  • Batchactive Scheduling - Cluster Scheduling for Explicitly-speculative Tasks
  • Cloud Scheduling (TetriSched) - maximize resource efficiency and utilization via a scheduler that accepts resource requests in the form of utility functions
  • Database I/O - optimizing database performance
  • Data-Intensive Supercomputing (DISC) - research to extend the type of computing systems used for Internet search to a larger range of applications
  • dbug - exploring an alternative method to stress testing called systematic testing, which controls the order in which certain concurrent events occur
  • DiskReduce - a framework for integrating RAID into replicated storage systems to lower storage capacity overhead
  • DiskSim - an efficient, accurate, highly-configurable disk system simulator.
  • DIXtrac - a program for disk extraction used to characterize over 100 performance-critical parameters
  • Elastic Storage (SpringFS)
  • eScience - PDL projects that are data-intensive and thus heavily invested in the use of computers for advancement
  • Expressive Storage Interfaces - increase the cooperation between device firmware and OS software to significantly increase the end-to-end performance and system robustness
  • Failure Data Analysis - to better understand what makes system unreliable, i.e. what do failures in today's large-scale production systems look like
  • Fates Database Storage - the Fates architecture offers efficient execution at all levels of memory hierarchy and optimizes data layout to improve performance, by exploiting the unique characteristics available at each level
  • FAWN - fast arrays of wimpy nodes
  • Fingerpointing - problem diagnosis in distributed systems
  • Freeblock Scheduling - a new approach to utilizing more of disks' potential media bandwidths
  • File System Virtual Appliances (FSVA) - a new approach for third-party FSs, leveraging virtual machines to decouple the OS version in which the FS runs from the OS version used by the user’s applications
  • Hadoop Workload Analysis - to better understand data scientists' use of the Hadoop system through workload analysis
  • Home Storage - data management for the home
  • Incast - addressing catastrophic TCP throughput collapse in storage server networks
  • IndexFS - scaling file system metadata for performance
  • Informed Prefetching - applications should issue hints which disclose their future I/O accesses
  • Landslide - Systematic dynamic race detection in kernel space
  • MEMS-Based Storage - a new technology that could provide significant performance gains over current disk drive technology and at lower cost
  • NASD - Network Attached Secure Disks: all storage systems that exhibit the following properties: direct client-drive data transfer in a networked environment, asynchronous oversight by the high level filesystem, cryptographic support for the integrity of requests, storage self management opportunities derived from a more abstract and independent role for storage systems, the ability to extend the feature set of a NASD for the purpose of applications, as well as for the client operating system
  • Non-Volatile Memory Techonologies - examining the use of NVM technologies as part of main memory, accessed directly using load/store instructions in order to overcome the challenges associated with building a DRAM-only main memory
  • N-Store - studying NVMs to understand their performance characteristics in the context of big data systems and build the groundwork for new DBMS architectures.
  • Otus - improving resource attribution through a monitoring system implementation
  • PASIS Survivable Storage - decentralized storage systems whose availability and security policies can survive component failures and successful malicious attacks
  • PDL vCloud - replacing a multitude of single-purpose clusters, managed and underutilized by individual groups, with an IaaS private cloud for class projects, simulations, data analyses, and cluster and data-intensive computing activities
  • Peloton - a relational database management system designed for fully autonomous optimization of hybrid workloads.
  • Petascale Data Storage Institute (PDSI) - addressing the challenges of petascale computing for scientific discovery on information storage capacity, performance, concurrency, reliability, availability, and manageability
  • PLFS - Parallel Log-Structured File System to act as an interposed layer inserted into the existing storage stack able to rearrange problematic access patterns to achieve much better performance from the underlying parallel file system
  • pNFS - considers the problem of limited bandwidth to NFS servers
  • PRObE - Parallel Reconfigurable Observational Environment -- a one-of-a-kind computer facility dedicated to large-scale systems research, which allows hands-on operation of very large compute resources
  • Problem Analysis - analyzing performance and reliability problems in deployed large-scale systems
  • pWalrus - a storage service layer that integrates parallel file systems effectively into cloud storage
  • RAID - Redundant Arrays of Independent Disks
  • RAIDframe - A Rapid Prototyping Tool for RAID Systems
  • Self-Securing Devices - systems with security functionality equally distributed among physically distinct system components
  • Self-Securing Storage - storage devices that prevent successful intruders from undetectably tampering with or permanently deleting stored data
  • Self-* Storage - a new storage architecture that integrates automated management functions and simplifies the human administrative task. Self*-systems are self configuring, self-organizing, self-managing, etc.
  • ShardFS - Replicated Directories with Sharded Files
  • SIO - focuses on improving I/O performance for massively parallel and large multiprocessor systems to include other forms of parallel computing such as networks of workstations
  • Storage QoS (PriorityMeister) - Providing storage QoS in dynamic heterogeneous networks and storage environments in the face of workload interference
  • Tetriscope - a combination of the application scheduler TetriSched and the visualization tool Atlas.
  • //TRACE - an approach for extracting and replaying traces of parallel application to automatically discover inter-node data dependencies and inter-request compute times for each node (process) in an application
  • Video - NASD video server: storage and retrieval of digital video
  • Workload Characterization - Data mining meets traffic modeling
  • YCSB++ - Yahoo! Cloud Serving Benchmark (YCSB) with a set of extensions to improve performance understanding and debugging
  • Other - e.g., Transparent Informed Prefetching and Caching (TIP), Scotch Parallel File Systems