AI Studio is a part of Nebius Cloud, one of the world’s largest GPU clouds, running tens of thousands of GPUs. We are building an inference platform that makes every kind of foundation model — text, vision, audio, and emerging multimodal architectures — fast, reliable, and effortless to deploy at massive scale.
Responsibilities:
- Develop and optimize low-level kernels and runtime components for AI inference
- Improve performance of inference engines GPU platforms
- Profile and debug system-level and hardware-level performance issues
- Integrate support for new hardware architectures (Hopper, Blackwell, Rubin)
- Collaborate with ML and backend teams to optimize end-to-end execution
Required Qualifications:
- Strong proficiency in C++, OR expertise in GPU programming with a focus on low-level high-performance coding and memory management
- Experience in GPU programming or systems-level software development, e.g. operating system internals, kernel modules, or device drivers
- Hands-on experience with profiling and debugging tools to identify performance issues on both CPUs and GPUs, and the ability to optimize code based on those findings.
- Solid understanding of CPU/GPU architecture and memory hierarchy
Preferred Qualifications:
- Experience with GPU computing programming: CUDA, ROCm, CUTLASS, Cute, ThunderKittens, Triton, Pallas, Mosaic GPU
- Familiarity with ML inference runtimes (e.g. TensorRT, TVM)
- Knowledge of Linux internals, drivers, or compiler toolchains
- Experience with tools like perf, VTune, Nsight, or ROCm profiler
- Familiarity with popular inference engines (e.g. such as vLLM, sglang, TGI)
What we offer:
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within Nebius.
- Hybrid working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.
Ready to apply for this role?
Apply Now →


