NVIDIA AI Infrastructure 認定 NCP-AII 試験問題:
1. You need to configure persistent network settings on your BlueField SmartNIC after deploying BlueField OS. Which file should you modify to ensure these settings are applied after each reboot, assuming a Debian-based distribution?
A) /etc/sysconfig/network-scripts/ifcfg-
B) /etc/resolv.conf
C) /etc/dhcp/dhclient.conf
D) /etc/hostname
E) /etc/network/interfaces
2. You're deploying a multi-GPU training job on a cluster using Slurm. You need to ensure that the GPUs allocated to the job are healthy and functioning correctly before the training starts. What's the MOST effective approach to pre-validate the GPU hardware?
A) Check the output of 'nvidia-smi' to ensure all GPUs are listed and have the expected memory.
B) Monitor the GPU temperature using 'nvidia-smi' during the first few minutes of the training job.
C) Allocate all available GPUs to the job and assume they are healthy.
D) Execute the NVIDIA Data Center GPU Manager (DCGM) diagnostic suite on the allocated GPUs.
E) Run a simple CUDA vector addition program on each GPU and check for errors.
3. You are tasked with setting up a secure environment for running GPU-accelerated machine learning workloads in Docker containers.
The security requirements dictate that containers should have minimal privileges and access only the necessary resources. Which of the following security measures are most relevant when using NVIDIA GPUs with Docker?
A) Use AppArmor or SELinux profiles to restrict the capabilities of the Docker containers, limiting their access to system resources.
B) Grant the Docker containers direct access to the host's hardware devices, including the GPU, to maximize performance.
C) Implement network segmentation and firewalls to isolate the Docker containers from other services and the internet.
D) Run the Docker daemon in rootless mode to reduce the risk of privilege escalation.
E) Regularly scan Docker images for vulnerabilities using tools like Clair or Trivy and rebuild images with patched dependencies.
4. You are tasked with automating the BlueField OS deployment process across a large number of SmartNICs. Which of the following methods is MOST suitable for this task?
A) Manually flashing each SmartNIC using the 'bfboot utility on a workstation.
B) Using a network boot (PXE) server to deploy the BlueField OS image over the network. This allows centralized management and scalability.
C) Utilizing the 'dd' command to directly copy the image to each SmartNIC's flash memory.
D) Utilizing a custom-built python script to flash each individual card, controlled from a central server. This method supports parallel flashing.
E) Creating a custom ISO image with the BlueField OS and booting each SmartNIC from a USB drive.
5. You're debugging performance issues in a distributed training job. 'nvidia-smi' shows consistently high GPU utilization across all nodes, but the training speed isn't increasing linearly with the number of GPUs. Network bandwidth is sufficient. What is the most likely bottleneck?
A) CUDA Graphs is not being utilized.
B) NCCL is not configured optimally for the network topology, leading to high communication overhead.
C) The learning rate is not adjusted appropriately for the increased batch size across multiple GPUs.
D) The global batch size has exceeded the optimal point for the model, reducing per-sample accuracy and slowing convergence.
E) Inefficient data loading and preprocessing pipeline, causing GPUs to wait for data.
質問と回答:
質問 # 1 正解: E | 質問 # 2 正解: D | 質問 # 3 正解: A、C、D、E | 質問 # 4 正解: B | 質問 # 5 正解: B、C、D、E |