TIME: 12:00 noon - to approximately 1:00 pm EDT
PLACE: Virtual - a zoom link will be emailed closer to the seminar
SPEAKER: Ezra Hoch, Janestreet
Depot: Multi-DC Storage for AI/ML Workloads
As the importance of AI/ML workloads grows, so does the number of GPUs. Both power and availability constraints led Jane Street to run GPU workloads in multiple data-centers. Managing datasets across the estate becomes a challenge, especially in Jane Street's agile and dynamic environment.
I'll talk about Depot, a storage metadata layer that we're building to address Jane Street's use cases; what issues we've seen with using NFS's directory-structure as a metadata layer, the API trade-offs we've considered, what API we landed on (a middle-ground between S3 and a filesystem) and Depot's architecture.
BIOS: Ezra Hoch is a software engineer at Jane Street, working on distributed storage systems. His previous roles at Google include TL-ing GCP’s file solutions, developing GCP’s networking stack and leading Effingo, Google’s global replication system. Prior to Google, he was chief architect at Elastifile, a startup developing a scale-out SSD-optimized filesystem (acquired by Google). His focus is on large-scale distributed infrastructure systems. He got his BSc in Computer Science, MSc and PhD in distributed algorithms, from the Hebrew university of Jerusalem.
, PDL Co-Director
RMCIC 2311
, PDL Co-Director
(412) 268-3064
GHC 9109
Executive Director, Parallel Data Lab
VOICE: (412) 268-5485
PDL Administrative Manager
VOICE: (412) 268-6716