Parallel Data Laboratory

PDL Talk Series

July 9, 2026

TIME: 12:00 noon to approximately 2:00 pm EDT
PLACE: Virtual - a zoom link will be emailed closer to the seminar

SPEAKERS: Gary Grider & Brian Atkinson, Los Alamos National Laboratory

BIO: Gary Grider is currently the Deputy Division Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Deputy Division Leader, Gary is responsible for all aspects of High Performance Computing technologies at Los Alamos. Additionally, Gary is responsible for conducting and sponsoring R&D for keeping the new technology pipeline full to provide solutions to problems in the Lab¹s HPC environment. Additionally, Gary is the past national co-coordinator for the High End Computing Interagency Working Group (HECIWG) File Systems and I/O (FSIO) advisory team which guides and coordinates all government spending on HEC FSIO R&D. Additionally, Gary is the Director of the Los Alamos Information Science and Technology Institute (ISTI), LANL/UCSC Institute for Scientific Scalable Data Management and the LANL/CMU Institute for Reliable High Performance Information Technology. He is also the LANL PI for the Petascale Data Storage Institute, a DOE SciDAC2 Institute.

BIO: Brian Atkinson is a HPC Engineer at Los Alamos National Laboratory in the High Performance Computing Division. Brian’s work has focus on building fast storage end points and evaluating new technologies to integrate into the Lab’s future storage systems. Brian has made contributions to various open-source file system projects and was the lead developer on Direct I/O integration into OpenZFS. Brian is currently the HPC-DES storage team lead where he guides his team's research efforts in distributed storage systems. Brian received his B.S. in computer science from Coastal Carolina University and his M.S. in Computer Engineering from Clemson University.

TALK 1: The Grand Unified File Index (GUFI), a Global Metabase
LANL developed GUFI nearly a decade ago to fill a gap of having no commercial file index that served both users and storage admins and scaled to billions to trillions of files. GUFI is the most complete file indexing technology that can store any amount of metadata about files/folders and provides sql query over that data but only lets users see the metadata derived from files the user would be able to see via bucket/traversal access control and file read access control. Given this unique capability, GUFI can enable findability across a sites holdings and turn millions to trillions of files into a RAG and soon GraphRAG without having to move the data to a data lakehouse. In fact GUFI could generate a virtual lakehouse over any subset of data in the holdings enabling analytics. GUFI is open source and is embedded in nearly a dozen products. Research into extracting neighborhoods of information and mapping knowledge graphs on top of access control graphs and compact metadata representations of large scale simulation output files is occurring. What GUFI can do will be presented and any questions on how it works can be fielded.

TALK 2: PNFS-Lattice
LANL has been instrumental in leading the efforts towards PNFS since 2001. A PNFS parallel file system solution can be built using the standard PNFS linux client and the standard NFS linux data server, but you have to buy a metadata server. PNFS-Lattice, an effort between Peak:AIO and LANL, has produced the first open source PNFS metadata server and it is a parallel metadata service from the beginning. Further it lives in user space making it possible for industry and universities to participate in the development. The service separated the protocol service from the catalog service and other services like policy. This allows for innovation and research on all or just parts of the PNFS metadata server. The motivation and architecture for the PNFS-Lattice open source user space linux metadata service will be presented.

TALK 3: D-Threads (and D-Mutex)
The memory industry has produced CXL (Compute Express Link) which allows for memory to be attached to one or multiple hosts via CXL or other technologies. Currently the main use of CXL memory is for Cloud providers to “sell” memory use by carving up large pools of memory to allow for easier packing of processes onto Cloud resources by separating memory and processing. Recently work has been done to allow a shared virtual address space between multiple processes on multiple hosts allowing load/store access from multiple hosts. The missing piece of being able to take existing shared memory programs and distributed them across machines that can see the same physical memory is largely about thread management and synchronization between threads on different hosts. D-Threads is a Pthreads like package that we hope will allow Pthreads/Mutex programs to easily be ported to D-threads making distribution of load store style shared memory access possible. A short discussion of this effort will be discussed.

CONTACTS

, PDL Co-Director
RMCIC 2208

, PDL Co-Director
(412) 268-3064
GHC 9109

Executive Director, Parallel Data Lab
(412) 268-5485

PDL Administrative Manager
(412) 268-6716