We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Site Reliability Engineer

Spectraforce Technologies
United States, Washington, Seattle
Nov 26, 2025
Job Title: Sr. Systems Reliability Engineer

Location: Seattle, WA

Duration: 12 Months CTH


Key Responsibilities:

  • Contribute to the SRE strategy and establish best practices for release management, automation, and system reliability.
  • Mentor and guide SRE, Engineering, and Product teams in adopting core SRE principles such as service ownership, reducing toil, and continuous improvement.
  • Lead initiatives across SLIs/SLOs, observability, incident management, and postmortem practices, ensuring insights and learnings are captured and acted upon.
  • Champion SRE practices by implementing repeatable templates for logging, monitoring, and alerting frameworks.
  • Drive observability and monitoring excellence using tools such as Grafana, AppDynamics (AppD), and Sumo Logic, ensuring proactive detection and resolution of issues.
  • Partner with engineering to design reliable, fault-tolerant systems and reduce operational toil through automation.
  • Implement and leverage the Ansible Automation Platform to help teams automate infrastructure provisioning, configuration management, and event-driven workflows.
  • Enable teams to automate operational events and infrastructure changes, reducing manual intervention and improving system resilience.
  • Exercise sound judgment to ensure operational compliance with security, privacy, audit, disaster recovery, and other company requirements.



Job-Specific Skills, Experience & Education

  • Minimum of 5 years of experience in Site Reliability Engineering, IT operations, or related fields.
  • Bachelor's degree in computer science, engineering, or equivalent experience (2 additional years in lieu of degree).
  • Technical expertise in system reliability, scalability, application design, and performance.
  • Hands-on experience with observability and monitoring tools such as Grafana, AppDynamics, and Sumo Logic.
  • Experience with automation platforms, particularly Ansible, for infrastructure and event-driven automation.
  • Proven ability to mentor and guide engineers in adopting SRE practices and principles.
  • Excellent communication and collaboration skills across diverse teams and vendors.
  • Strong judgment and problem-solving capabilities.
  • Experience working in multi-cloud environments.
  • Strong interpersonal, organizational, communication, and customer service skills.


Preferred

  • Experience applying ITIL, SRE and IT process best practices.
  • Experience in tracking major incidents, rollbacks, and hotfixes; leading root cause analysis (RCA) processes; and ensuring resolution and completion of action items.
  • Experience with technical engineering in IT operations.

Applied = 0

(web-df9ddb7dc-vp9p8)