Frequently asked questions and answers of CUDA in Artificial Intelligence and Machine Learning of Computer Science to enhance your skills, knowledge on the selected topic. We have compiled the best CUDA Interview question and answer, trivia quiz, mcq questions, viva question, quizzes to prepare. Download CUDA FAQs in PDF form online for academic course, jobs preparations and for certification exams .
Intervew Quizz is an online portal with frequently asked interview, viva and trivia questions and answers on various subjects, topics of kids, school, engineering students, medical aspirants, business management academics and software professionals.
Question-1. How do you manage multiple GPUs in CUDA?
Answer-1: Multiple GPUs can be managed using device selection functions (cudaSetDevice) and separate streams for concurrent execution.
Question-2. What is the role of the CUDA driver API?
Answer-2: The CUDA driver API provides low-level access to GPU resources and enables more control over kernel execution and memory management than the runtime API.
Question-3. What are streams in CUDA?
Answer-3: Streams in CUDA allow concurrent execution of tasks like memory transfers and kernel execution, improving performance by overlapping operations.
Question-4. What is PTX in CUDA?
Answer-4: PTX (Parallel Thread Execution) is an intermediate assembly-like language used by NVIDIA GPUs to represent parallel code.
Question-5. What is the difference between cudaMemcpy and cudaMemcpyAsync?
Answer-5: cudaMemcpy is synchronous, blocking until the transfer is complete, while cudaMemcpyAsync is asynchronous, allowing the program to continue execution immediately.
Question-6. What is the CUDA profiler?
Answer-6: The CUDA profiler is a tool for analyzing and optimizing the performance of CUDA applications, providing insights into memory usage, kernel execution, and bottlenecks.
Question-7. What is CUDA?
Answer-7: CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA for general-purpose computing on GPUs (GPGPU).
Question-8. Who developed CUDA?
Answer-8: CUDA was developed by NVIDIA.
Question-9. What programming languages can be used with CUDA?
Answer-9: CUDA supports languages like C, C++, Fortran, and Python (via libraries like PyCUDA).
Question-10. What is a CUDA kernel?
Answer-10: A CUDA kernel is a function written in CUDA C/C++ that runs on the GPU in parallel across multiple threads.
Question-11. What is a thread in CUDA?
Answer-11: A thread is the smallest unit of execution in CUDA, responsible for executing a kernel function on a single data element.
Question-12. What is a thread block in CUDA?
Answer-12: A thread block is a collection of threads that execute the same kernel and can share memory and synchronize with each other.
Question-13. What is a grid in CUDA?
Answer-13: A grid is a collection of thread blocks, enabling large-scale parallelism by organizing many threads across the GPU.
Question-14. What is the role of the GPU in CUDA programming?
Answer-14: The GPU acts as the compute device, executing parallel tasks offloaded by the CPU.
Question-15. What is shared memory in CUDA?
Answer-15: Shared memory is a fast, on-chip memory shared among threads in the same block.
Question-16. What is global memory in CUDA?
Answer-16: Global memory is the main memory on the GPU, accessible by all threads but slower than shared memory.
Question-17. What is local memory in CUDA?
Answer-17: Local memory is memory private to a thread, stored in global memory, and used when registers are insufficient.
Question-18. What is constant memory in CUDA?
Answer-18: Constant memory is a small, read-only memory area accessible by all threads, optimized for broadcasting the same data across threads.
Question-19. How do you define a CUDA kernel?
Answer-19: A CUDA kernel is defined using the __global__ keyword, e.g., __global__ void kernelName(...) { }.
Question-20. How do you launch a CUDA kernel?
Answer-20: CUDA kernels are launched with the syntax kernelName<<
Question-21. What is a warp in CUDA?
Answer-21: A warp is a group of 32 threads in CUDA that execute instructions in lockstep.
Question-22. What is coalesced memory access in CUDA?
Answer-22: Coalesced memory access occurs when threads in a warp access consecutive memory addresses, improving memory access efficiency.
Question-23. What are CUDA streams?
Answer-23: CUDA streams are sequences of operations (kernels, memory copies, etc.) that execute on the GPU in order or concurrently.
Question-24. What is the difference between synchronous and asynchronous memory operations in CUDA?
Answer-24: Synchronous operations block until completion, while asynchronous operations allow the host and device to execute other tasks concurrently.
Question-25. How do you measure GPU performance in CUDA?
Answer-25: GPU performance is typically measured using metrics like kernel execution time, memory throughput, and floating-point operations per second (FLOPS).
Question-26. What is the purpose of the cudaMemcpy function?
Answer-26: cudaMemcpy transfers data between the host and device memory.
Question-27. What is the difference between cudaMemcpyHostToDevice and cudaMemcpyDeviceToHost?
Answer-27: cudaMemcpyHostToDevice transfers data from the host (CPU) to the device (GPU), while cudaMemcpyDeviceToHost transfers data in the opposite direction.
Question-28. What is pinned memory in CUDA?
Answer-28: Pinned memory (page-locked memory) is a region of host memory that cannot be swapped out, enabling faster data transfers between host and device.
Question-29. What is the purpose of the __shared__ keyword in CUDA?
Answer-29: The __shared__ keyword declares shared memory that can be accessed by all threads within a block.
Question-30. What is the purpose of the __device__ keyword in CUDA?
Answer-30: The __device__ keyword defines functions that are executed only on the GPU and callable only from other GPU functions or kernels.
Question-31. What is the __syncthreads() function used for in CUDA?
Answer-31: __syncthreads() synchronizes all threads in a block, ensuring they reach the same point in execution before continuing.
Question-32. What is bank conflict in CUDA shared memory?
Answer-32: Bank conflict occurs when multiple threads access the same memory bank, causing serialization of memory accesses and reducing performance.
Question-33. What is a CUDA context?
Answer-33: A CUDA context is the environment within which CUDA kernels and memory operations execute, associated with a specific GPU.
Question-34. How does CUDA handle error reporting?
Answer-34: CUDA provides error codes and helper functions like cudaGetErrorString() to report and interpret errors.
Question-35. What is the purpose of the cudaMalloc function?
Answer-35: cudaMalloc allocates memory on the GPU device.
Question-36. What is the purpose of the cudaFree function?
Answer-36: cudaFree deallocates memory previously allocated on the GPU.
Question-37. What are CUDA device properties?
Answer-37: CUDA device properties include characteristics like number of CUDA cores, shared memory size, global memory size, etc., retrievable using cudaGetDeviceProperties().
Question-38. How does CUDA achieve parallelism?
Answer-38: CUDA achieves parallelism by executing multiple threads in parallel on GPU cores, organized into blocks and grids.
Question-39. What is the difference between __global__ and __device__ in CUDA?
Answer-39: __global__ functions are called from the host and run on the device, while __device__ functions are called only from the device and execute on the device.
Question-40. What is texture memory in CUDA?
Answer-40: Texture memory is a cached, read-only memory optimized for 2D spatial locality, often used in graphics or imaging applications.
Question-41. What is unified memory in CUDA?
Answer-41: Unified memory is a shared memory space accessible by both the CPU and GPU, simplifying memory management.
Question-42. How do you optimize CUDA kernels?
Answer-42: CUDA kernels can be optimized by improving memory access patterns, minimizing divergence, maximizing occupancy, and reducing shared memory bank conflicts.
Question-43. What is warp divergence in CUDA?
Answer-43: Warp divergence occurs when threads in a warp follow different execution paths due to conditional statements, reducing performance.
Question-44. What is the purpose of the cudaEvent API in CUDA?
Answer-44: The cudaEvent API is used to measure execution time, synchronize streams, and handle events between the host and device.
Question-45. What is CUDA Thrust?
Answer-45: CUDA Thrust is a high-level C++ template library for parallel programming, simplifying operations like sorting, searching, and reductions on GPUs.
Question-46. What is dynamic parallelism in CUDA?
Answer-46: Dynamic parallelism allows GPU kernels to launch other kernels directly, enabling recursive algorithms and dynamic workloads.
Question-47. What is the maximum number of threads per block in CUDA?
Answer-47: The maximum number of threads per block is hardware-dependent, typically 1024 for modern GPUs.
Question-48. What are compute capabilities in CUDA?
Answer-48: Compute capabilities describe the features supported by a GPU, including maximum threads, memory sizes, and supported instructions.
Question-49. What is occupancy in CUDA?
Answer-49: Occupancy refers to the ratio of active warps to the maximum number of warps that can be run on an SM, indicating how well GPU resources are utilized.
Question-50. What are CUDA atomic operations?
Answer-50: CUDA atomic operations are operations that guarantee exclusive access to a memory location, preventing race conditions when multiple threads access the same memory.
Frequently Asked Question and Answer on CUDA
CUDA Interview Questions and Answers in PDF form Online
CUDA Questions with Answers
CUDA Trivia MCQ Quiz