Amazon logo

Product Test Engineer - Machine Learning Hardware, RRL Technical Engineering

Amazon Florence, Kentucky, United States

onsite
Posted Dec 4, 2025

Role & seniority: Product Test Engineer (mid to senior level) focused on system-level validation for ML acceleration hardware.

Stack/tools: ML acceleration products; hardware + firmware + software integration; Linux (5+ yrs); scripting/programming (Python, Java, Perl, PHP, Ruby, Bash/Shell); test infrastructure; 24/7 production environment familiarity; TCP/IP and web services knowledge (nice-to-have).

Top 3 responsibilities

  1. Design and implement system-level test strategies and scalable functional/performance tests for complete ML systems.

  2. Build and maintain scalable test infrastructure; bring-up, first-boot tests; analyze test data; debug complex hardware/software interactions.

  3. Collaborate with hardware and software teams to ensure end-to-end validation, improve test coverage, and document procedures.

Must-have skills

  • 4+ years in SRE, systems engineering, or related operations roles; 5+ years Linux experience; 5+ years systems engineering experience.

  • BS in Systems Engineering, CS, or related field (or equivalent work experience).

  • Proficiency with Linux and at least one major scripting/programming language (e.g., Python, Java, Bash/Shell, etc.).

Nice-to-haves

  • Networking knowledge (TCP/IP, HTTP, DNS); automation scripting to reduce operational burden; experience in 24/7 production environments; experience with SOA/web services.

Location & work type: Location not specified; full-time, on-site/hybrid arrangements not defined.

Full Description

Shape the future of AI infrastructure! Join RRL and develop system-level test solutions for our innovative ML acceleration hardware, deployed across our global server fleet.

We're seeking highly skilled and motivated Product Test Engineers to join our RRL test & repair operation. In this role, you'll be at the forefront of validating and ensuring that test infrastructure is deployed and operating at required scale. You'll be responsible for designing and implementing comprehensive system-level test strategies that cover the full spectrum of our ML acceleration products, from individual components to fully integrated systems. This position requires a unique blend of hardware knowledge, software expertise, and systems thinking, as you'll be working at the intersection of custom silicon, complex firmware, and high-performance ML workloads.

You'll collaborate closely with cross-functional teams including hardware designers, software engineers, and operations specialists to develop robust test solutions that can scale to meet the demands of RRL's global infrastructure. Your work will be crucial in identifying and resolving integration issues, optimizing system performance, and ultimately ensuring that our ML acceleration products meet the highest standards of reliability and efficiency in real-world data center environments. If you're passionate about pushing the boundaries of ML hardware testing and have a knack for solving complex system-level challenges, we want you on our team.

Key job responsibilities

  • Design and implement system-level test strategies for ML acceleration products
  • Develop comprehensive functional and performance tests for complete ML systems
  • Create and maintain scalable test infrastructure for high-volume product validation
  • Implement product bring-up and first-boot test procedures
  • Drive improvements in test coverage, product quality, and manufacturing efficiency
  • Collaborate with hardware and software teams to ensure end-to-end product validation
  • Analyze system-level test data to identify and resolve integration issues
  • Debug complex hardware/software interactions in a production environment
  • Develop and maintain documentation for system test procedures and manufacturing processes
  • Optimize test workflows to balance thoroughness with production efficiency

Basic Qualifications: - 4+ years of site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration experience

  • 5+ years of Linux experience

  • 5+ years of systems engineering experience

  • Bachelor's degree in Systems Engineering, Computer Science, or related field

  • or relevant work experience

  • Experience in site reliability engineering (SRE), systems engineering, systems

  • administration, DevOps, security administration, or network administration

  • Experience working with Linux

  • Experience in systems engineering

  • Experience in any of the following: Python, Java, Perl, PHP, Ruby, Bash, Shell

or equivalent Preferred Qualifications: - Knowledge of TCP/IP and networking protocols such as HTTP and DNS

  • Experience designing and developing scripts to automate operational burdens

  • and reviewing scripting changes to ensure they meet the standards for

  • maintainability, scalability and security

  • Experience working in 24/7 production environment

  • Experience with service-oriented architecture and web services

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit

https: //amazon.jobs/content/en/how-we-hire/accommodations

[https: //amazon.jobs/content/en/how-we-hire/accommodations] for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $94,000/year in our lowest geographic market up to $207,900/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit

https: //www.aboutamazon.com/workplace/employee-benefits

[https: //www.aboutamazon.com/workplace/employee-benefits]. This position will remain posted until filled. Applicants should apply via our internal or external career site.

Machine LearningHardware TestingSystem-Level TestingLinuxPythonJavaPerlPHPRubyBashShellTCP/IPNetworking ProtocolsAutomationDebuggingDocumentationmulti-location

Cookies & analytics consent

We serve candidates globally, so we only activate Google Tag Manager and other analytics after you opt in. This keeps us aligned with GDPR/UK DPA, ePrivacy, LGPD, and similar rules. Essential features still run without analytics cookies.

Read how we use data in our Privacy Policy and Terms of Service.