Site reliability engineer

Pyramid Consulting, Inc

Publicada el 13 junio

Descripción

Project outline:

we are looking for a site reliability engineer with experience in incident response. In this role, you will help shipt understand where we can improve stability and reliability. There will be a focus on the intersection of systems engineering and data science, building the tooling and culture necessary to transform raw incident logs into actionable reliability strategies.

skill requirements:

- engineering background: 4+ years in sre, devops, or systems engineering roles managing production environments at scale.

- data proficiency: strong experience with sql and data analysis

- coding skills: expertise in one or more programming languages such as golang, java, python, or c++.

- observability expertise: deep understanding of alerting systems, distributed tracing, structured logging, and metrics collection.

- systems design: experience with container orchestration (kubernetes) and cloud infrastructure (gcp).

experience requirements:

- statistical mindset: experience applying statistical methods (e.g., outlier detection, regression analysis) to system performance data.

- the "human factor": a passion for resilience engineering and understanding how human decision-making impacts system reliability.

- communication: ability to translate complex technical failures into clear, non-technical business impact reports

Aplicar

Crear una alerta

Guardar