Implementing AI Data Center Networks Workshop
Implementing AI Data Center Networks Workshop
|
|
|
|
|
This advanced three-day workshop provides students with advanced-level knowledge that might be helpful when building and working with Juniper Apstra in an artificial intelligence data center (AI DC). This workshop will provide attendees with the background knowledge necessary to understand the usage of the four networks described in the AI DC Juniper Networks validated design (JVD)—the out-of-band (OOB), frontend, backend graphics processing unit (GPU), and backend storage network.
Students will learn to train AI models using the PyTorch framework on a single server with one GPU, a single server with multiple GPUs (with discussions on NVlink and NVswitch), and multiple servers each having multiple GPUs. Students will gain familiarity with the Nvidia CPU (Grace), GPUs (Blackwell, Hopper, Ada Lovelace), and compute platform architectures (DGX A100 and H100). Students will be provided an overview of the Nvidia Superpod, Hewlett-Packard (HP) AI DC, and Weka AI DC reference designs as well as a deep dive into Juniper’s AI DC JVD.
In the case of the backend GPU network, students will learn that using RDMA (or RoCEv2) and a rail-optimized network design ensures an optimal communication path for the Nvidia Collective Communication Library (NCCL) collective operations. For both backend networks, students will learn how to use both Data Center Quantized Congestion Notification (DCQCN) and dynamic load balancing (DLB) to ensure lossless data transfer over an Ethernet-based network. Students will learn how to use Terraform and Apstra to deploy the AI DC networks as well as orchestrate the training cluster using Slurm. Finally, students will learn how to use their trained model to make predictions.
It is assumed that students already have a background in Python and have attended the Data Center Automation using Juniper Apstra (APSTRA) course or have a similar foundational knowledge of Apstra.
Through lecture only, students will gain knowledge in deploying and using an AI DC to train and deploy an AI model in a DC based on Juniper’s AI DC JVD.
|
Certification Track: Data Center
Difficulty Level: Advanced
Additional Details
- Course eBook included.
- Live online training is delivered using Zoom which supports real-time closed captions in up to 23 languages.
- For live, location-based, in-person training, you will receive the address and classroom location details within the registration confirmation email.
- See complete terms & conditions.
- Course pricing may vary based on Juniper Networks Authorized Education Partner (JNAEP) offerings and locations.
- Regional pricing is available in APAC. Please contact our Education APAC Sales Team.
- Contact Education Services Sales: AMER Sales | APAC Sales | EMEA Sales
- Need Support? Please contact us
Click on a date below to register. Please contact us if you wish to schedule a private training event.
Region
No upcoming dates were found for this activity.
|