NVIDIA Generative AI Multimodal 認定 NCA-GENM 試験問題:
1. You are building a multimodal generative A1 application that uses CLIP to align text and image embeddings. You observe that the generated images lack detail and fidelity to the text prompt. Which of the following strategies would be MOST effective in improving image quality, and how could prompt engineering and Triton Inference Server play a role?
A) Using a larger batch size during CLIP training and increasing the learning rate. Triton is not directly involved in model training.
B) Increasing the CLIP model's text encoder's hidden layer size and using more aggressive data augmentation during CLIP training. Triton can be used to serve the augmented CLIP model at scale.
C) Training a separate image super-resolution model to enhance the generated images after they are produced by the CLIP-guided generator. Triton can manage the concurrent execution of the generator and super-resolution models.
D) All of the above
E) Refining the text prompts to be more descriptive and specific, incorporating stylistic details and relevant keywords. Triton can optimize the prompt embedding process.
2. A research team has developed a novel multimodal model that fuses text, image, and audio dat a. They want to quantitatively evaluate the model's performance in comparison to several existing state-of-the-art models. Which of the following evaluation metrics would be MOST appropriate to assess the model's ability to generate coherent and relevant text descriptions based on the combined multimodal input?
A) Frechet Inception Distance (FID).
B) Structural Similarity Index Measure (SSIM).
C) BLEU (Bilingual Evaluation Understudy) and ROIJGE (Recall-Oriented Understudy for Gisting Evaluation).
D) Inception Score.
E) Perplexity.
3. You're building a multimodal model that predicts customer satisfaction based on their written reviews and associated call center audio recordings. You've pre-trained separate text and audio encoders. What's the MOST effective strategy to fuse these modalities for the final prediction task?
A) Train a separate attention mechanism to weigh the contributions of each modality before concatenation.
B) Concatenate the final hidden states from both encoders and feed them into a fully connected layer.
C) Fine-tune only the text encoder, keeping the audio encoder frozen.
D) Average the output embeddings of both encoders element-wise.
E) Add the hidden states of both encoders element-wise.
4. Consider a system that generates captions for images, and a key metric is BLEU score. You observe that while the BLEU score is high, the generated captions often lack detailed descriptions of the objects and relationships within the image. Which of the following strategies would you employ to improve the descriptive richness of the generated captions?
A) Train the model to minimize cross-entropy loss between predicted and ground truth captions.
B) Increase the beam size during decoding to explore a wider range of possible captions.
C) Fine-tune the model using Reinforcement Learning with a reward function that encourages detailed descriptions, such as CIDEr or SPICE.
D) Implement early stopping based solely on BLEU score during training.
E) Reduce the size of the vocabulary to focus on the most common words.
5. You're developing a multimodal model that combines text and audio for sentiment analysis. The text component is performing well, but the audio component contributes very little to the overall accuracy. What's the MOST likely reason and how could you address it?
A) The audio data is too large. Downsample the audio data to reduce computational cost.
B) The audio data is irrelevant. Remove the audio component entirely.
C) The audio features are not properly aligned with the text features. Use a cross-modal attention mechanism to improve alignment.
D) The audio data is not preprocessed correctly. Apply aggressive noise reduction techniques.
E) The text component is simply too dominant. Reduce the weight given to the text component in the final prediction.
質問と回答:
| 質問 # 1 正解: C、E | 質問 # 2 正解: C | 質問 # 3 正解: A | 質問 # 4 正解: C | 質問 # 5 正解: C |














1483 お客様のコメント
品質保証JPexamはIT認定試験のシラバスに従って、試験問題の範囲を正確に絞って、的中率が99%の最新問題集を捧げます。
1年間の無料更新サービスJPexamは1年以内に問題集の無料更新サービスを提供し、お客様がいつでも最新版の問題集を持つことを保証いたします。もし試験の内容が変更されたら、弊社は直ちにお客様にお知らせします。それに、弊社の問題集が更新されたら、早速メールで最新バージョンを送付いたします。
全額返金JPexamの問題集を利用すると、短時間で勉強しても試験に合格できるのを保証いたします。試験に不合格になってしまった場合、弊社は全額返金いたします。(
ご購入前のお試しJPexamは問題集のサンプルを無料で提供いたします。ご購入前にサンプルを試用して製品の品質を確認することができます。ご遠慮なく利用してください。
