Job overview
the role of a site reliability engineer is to ensure the reliability and performance of it systems, pipelines, and applications. The ideal candidate will have a strong focus on innovation, experience working with business partners and vendors, and advanced knowledge of modern devops stack tools.
key responsibilities:
1. taking ownership of problems and tasks, driving solutions, and continuously improving processes
2. establishing end-to-end monitoring and alerting for critical aspects of supported pipelines
3. managing and troubleshooting aws eks clusters to ensure reliability and performance
4. improving team practices and ensuring technical solutions meet quality, security, and compliance requirements
5. partnering with other sres on configuration management at scale
6. working with software engineers to define release processes and promote a culture of shared responsibility
a key aspect of this role is collaboration and partnership with cross-functional teams to deliver high-quality products and services that meet customer needs and expectations.