Deep Learning Inference Software Intern - 2025
Job Description
We are now looking for a Deep Learning Inference-Kernel and Performance Software Engineer Intern!
We are rapidly growing our research and software development for Inference. We seek excellent Software Engineers and Senior Software Engineers to join our team. We specialize in developing GPU-accelerated Deep Learning software. Researchers around the world are using NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Collaborate with the deep learning community to implement the latest algorithms for public release in TensorRT.
What you’ll be doing:
Develop deeply optimized deep learning kernels for inference.
Be responsible to do performance analysis and modelling to understand the performance limiter of current software stack as well as underlying hardware architecture.
Collaborate with different teams to improve both the software and architectures to extend the state of the art in performance, efficiency, reliability and programmability.
Work with cross-collaborative teams across automotive, image understanding, and speech understanding to develop creative solutions.
What we need to see:
Pursuing BS or higher degree in Computer Engineering, Computer Science, Electrical Engineer, or related computing focused degree.
SW Agile skills is helpful.
Excellent C/C++ programming and software design skills.
Python experience a plus.
Performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.
GPU programming experience (CUDA or OpenCL) desired.
Expertise in characterizing and modeling system-level performance, executing comparison studies, and documenting and publishing results.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and talented people in the world working for us. If you're creative and autonomous, we want to hear from you!
#deeplearning