Role & seniority: Staff-level MTS Software System Design Engineer focused on GPU compute/AI validation, debug & performance (lead technical authority in validation and performance initiatives).

Stack/tools: GPU architecture and parallel compute models; GPU drivers/runtimes; Linux & Windows; Python, Groovy, GitHub; CI/CD; GPU profiling/debug tools; hardware debug (JTAG, crash/log analysis); AMD/architecture collaboration; dashboards for data-driven decisions.

Top 3 responsibilities

Own end-to-end validation strategy for GPU compute and AI workloads (HPC, ML, DL) and ensure feature readiness.
Lead post-silicon validation, silicon bring-up, advanced debug, and root-cause analysis across HW/FW/drivers/runtimes/OS.
Lead performance characterization/optimization, identify bottlenecks, drive workload-aware improvements, and validate performance-per-watt and scalability; architect automation and integrate tests into CI/CD.

Must-have skills

8+ years in GPU compute/AI validation, debug, or performance
Deep GPU architecture knowledge and parallel compute models
Experience with AI/HPC workloads; drivers/runtimes; Linux and Windows
Hands-on GPU profiling/debugging; Python, Groovy; CI/CD; test development
Strong technical leadership and communication; mentoring/design reviews

Nice-to-haves

ROCm or similar compute stacks; compiler/runtime optimizations for AI workloads
Power, thermal, reliability (RAS) knowledge; hardware-software

Full Description

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

MTS SOFTWARE SYSTEM DESIGN ENGINEER

THE ROLE

We are looking for a Staff-level GPU Compute / AI Validation, Debug & Performance Engineer to lead validation, deep-debug, and performance optimization for next-generation GPU compute and AI platforms. This role requires strong expertise in GPU architecture, parallel computing, and AI workloads, along with the ability to drive cross-functional technical initiatives in a global MNC environment.
The ideal candidate will own complex validation areas, act as a technical authority for GPU compute/AI debug and performance, and influence architecture and design decisions through data-driven insights.

KEY RESPONSIBILITIES

GPU Compute / AI Validation Leadership
Own end-to-end validation strategy for GPU compute and AI workloads (HPC, ML, DL).
Define validation scope, coverage, and success metrics for compute pipelines.
Lead post-silicon validation, silicon bring-up, and feature readiness for GPU compute.
Ensure functional correctness across drivers, firmware, runtime, and frameworks.
Advanced Debug & Root Cause Analysis
Act as debug lead for complex GPU compute/AI issues spanning HW, FW, drivers, runtimes, and OS.
Debug GPU hangs, page faults, ECC errors, memory corruption, and scheduler failures.
Analyze failures using GPU traces, register dumps, crash dumps, JTAG, logs, windbg, counters and using AMD different profiler/debugger tools.
Work directly with architecture, RTL, and design teams to influence fixes and mitigations.
Performance Analysis & Optimization
Lead performance characterization and optimization for AI and compute workloads.
Identify bottlenecks across compute units, memory bandwidth, cache, interconnect, and power.
Drive workload-aware optimizations for training and inference use cases.
Validate performance-per-watt and scalability against product and architectural goals.

Staff AI/ML Validation Engineer

Top 3 responsibilities

Must-have skills

Nice-to-haves

Full Description

THE ROLE

KEY RESPONSIBILITIES

REQUIRED QUALIFICATION

PREFERRED EXPERIENCE

ACADEMIC CREDENTIALS