Role details
- Location: Paris
- Employment Type: Full time
- Location Type: Hybrid
- Department: Research team
- Compensation: $100K – $150K • Offers Equity • Relocation package
What you'll do
- Own inference infrastructure end-to-end: optimize latency, throughput, and cost across our model fleet.
- Build and scale model serving with TensorZero, vLLM/SGlang/TRT, and Kubernetes.
- Design and maintain vector search pipelines with Vector storages.
- Familiarity with support metrics (SLAs, FCR, deflection) and ability to define service health KPIs.
- Turn research into product: grab experimental models from the research team, figure out what's production-ready, and ship it - formatting, sampling parameters, deployment, the whole thing
Who you are
- 3+ years shipping high performance ML systems in production, not just training notebooks
- Deep hands-on experience with inference optimization - you've debugged latency spikes and know the difference between theoretical and real-world throughput
- Comfortable across the stack: from CUDA kernels to Kubernetes manifests to Grafana dashboards
- A big plus: experience with Rust, custom Triton kernels, benchmarks
Tech stack
- TensorZero
- vLLM/SGlang/TRT
- Kubernetes
- CUDA
- Rust (bonus)
- Custom Triton kernels (bonus)
- Grafana dashboards
Team description
Department: Research team
Benefits & perks
- Salary of $100,000 to $150,000 + equity
- 20 days of paid vacation
- Work from Paris (hybrid) + relocation package
- Best medical insurance in France
- All the hardware, tools, and services you need
- Covered subscriptions for AI agents and IDEs
- Team off-sites twice a year: Alps and Saint-Tropez
Ready to apply for this role?
Apply Now →