AI Reliability Engineer (AI SRE)

Contract Type:

Contract

Location:

Campbell - CA

Industry:

Information Technology

Contact Name:

Kiana Parham

Contact Email:

kparham@dewintergroup.com

Contact Phone:

(669) 246-6265

Date Published:

03-30-2026

Salary:

$50.00 - $175.00 Per Hour

Job ID:

38815

Title: AI Reliability Engineer (AI SRE)
Job Type: Contract
Contract Length: 12 Months
Pay Range: $50/hr – $175/hr
Start Date: ASAP
Location: Remote

About the Opportunity:

Our client, a leader in AI testing and Generative AI solutions, is looking for a skilled AI Reliability Engineer (AI SRE) to join their team for a 12-month engagement. This project involves ensuring the reliability, availability, and performance of mission-critical AI systems by defining SLOs, implementing automated resilience measures, and leading incident response. This is a high-impact role that requires a self-motivated professional who can hit the ground running and deliver results quickly.

Key Responsibilities & Deliverables:

This role is focused on the successful completion of specific tasks and deliverables. Your responsibilities will include:

Defining and maintaining Service Level Objectives (SLOs) for AI inference latency and availability.
Building automated "circuit breakers" and fallback logic (e.g., switching to a smaller model if the primary fails).
Leading incident response and root-cause analysis (RCA) for complex AI system failures.
Developing stress-testing and chaos engineering scenarios specifically for AI agent swarms.
Optimizing the "cold start" and scaling time for serverless AI functions.

Required Skills & Experience:

We are looking for someone with a proven track record of successful contract engagements. The ideal candidate will have:

4+ years of experience in Site Reliability Engineering (SRE).
Deep expertise in system monitoring, incident management, and cloud resilience. This isn't a learning role—you need to be a subject matter expert.
Demonstrated ability to work autonomously and manage your own time effectively to meet project goals.
Experience with Python/Go, Kubernetes, and observability stacks (Datadog, New Relic).
Strong communication skills to provide clear and concise status updates to the project team.

#LI-KP1

DeWinter Group and Maris Consulting is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. We post pay scales which are based on our client pay ranges. DeWinter, Maris, and our clients have the right to modify the requirements of the role which can impact the pay ranges posted.

Share this job

AI Reliability Engineer (AI SRE)

Similar Jobs

(408) 297-7500