Sre lead sr

Xico, Méx

Spin

Publicada el 31 julio

Descripción

Objective of the role

responsible for driving the strategic direction, operational excellence, and continuous evolution of site reliability engineering practices across critical systems and services. This role leads a team of sre engineers and complex initiatives, ensuring high availability, scalability, and performance. The senior lead of sre fosters cross-functional collaboration, anticipates future infrastructure needs, and aligns sre practices with business and product priorities, while cultivating a culture of ownership, automation, and resilience and driving operational excellence with engineering teams.

main responsibilities

* build, lead, and inspire high-performing sre teams, fostering a culture of operational ownership, engineering excellence, and continuous learning.
* define and execute the strategic roadmap for sre, integrating best practices in reliability, incident management, observability, and infrastructure automation in alignment with business and product goals.
* elevate observability across the stack by designing and enforcing standards for telemetry, structured logging, distributed tracing, and service-level dashboards. Ensure 100% coverage of business-critical systems with actionable metrics and alerting along with the engineering teams.
* act as the technical escalation point for the most complex production issues, leading hands-on incident response and deep root cause analysis in large-scale, low-latency, event-driven architectures.
* champion automation-first infrastructure practices, enforcing iac, immutable deployments, and auto-remediation patterns that reduce manual intervention and accelerate delivery.
* drive architectural and operational improvements through close partnership with product engineering, platform, security, and architecture teams. Proactively identify and mitigate systemic reliability risks and performance bottlenecks.
* lead the definition, adoption, and review of slis, slos, and error budgets, ensuring they are embedded into engineering and product decision-making processes.
* operationalize change management, chaos engineering, and dr strategies, validating readiness through frequent simulations and failover exercises.
* mentor and develop sre leads and senior engineers, scaling internal capabilities and reinforcing technical depth across the organization.
* represent sre in architecture boards, and business reviews, aligning engineering reliability strategies with company-wide objectives.
* promote a culture of autonomy and proactive engineering, encouraging teams to own their services end-to-end with accountability and resilience thinking.
* serve as a cultural leader within spin, fostering psychological safety, ownership, and a sense of mission to serve millions of people across latam with secure, reliable financial technology.

required knowledge and experience

* bachelor’s degree in computer science, software engineering, or related field (or equivalent experience).
* 10+ years of experience in sre, devops, or software engineering roles, with at least 4+ years in leadership roles.
* strong experience leading distributed sre or platform teams in complex, production-scale environments.
* deep understanding of reliability engineering principles, cloud-native infrastructure on aws, observability, and incident response.
* hands-on experience with infrastructure as code, ci/cd pipelines, containers, and orchestration tools.
* strong architectural and performance optimization skills across cloud and hybrid infrastructure.
* demonstrated ability to influence and collaborate across engineering, product, and business teams.
* familiarity with regulatory and security frameworks relevant to infrastructure reliability.
* excellent communication and leadership skills, with experience presenting to senior stakeholders.
* strategic thinking, systems-level problem solving, and a proactive approach to continuous improvement.

Aplicar

Crear una alerta

Guardar