Site reliability engineering manager

Juárez, Chih

Apex Systems

Publicada el 10 marzo

Descripción

Role overviewhiring a senior site reliability engineer (sre) to join platform engineering team. This role is responsible for the shared aws infrastructure that supports client ́s core products, including a large monolithic web application as well as a growing set of microservices. This is not a "ticket‐based ops" role — it's about building and evolving the platform that engineering relies on. Core responsibilities infrastructure & reliability own and manage shared aws infrastructure used across the company maintain and operate eks clusters ensure reliability, scalability, and performance of production systems monitor infrastructure health and proactively address issues observability & monitoring own monitoring, logging, and alerting across infrastructure and applications heavy use of: grafana opensearch clusters design alerts that: detect infra and application issues early are actionable (not noisy) drive observability standards across teams ci/cd & automation design, build, and maintain ci/cd pipelines improve deployment safety, speed, and consistency automate infrastructure and development workflows partner closely with engineering and qa to support reliable releases must‐have experience senior‐level experience in sre, devops, or platform engineering strong aws experience infrastructure as code (terraform preferred) kubernetes / eks in production environments designing and operating ci/cd pipelines hands‐on experience with observability tooling monitoring logging alerting (grafana or similar)

Aplicar

Crear una alerta

Guardar