Overview
the ic3 sre engineer is responsible for supporting and enhancing the reliability, availability, and performance of the company's it infrastructure and applications. This semi-senior role focuses on improving system stability and efficiency through advanced monitoring, automation, and incident response, contributing to the overall success of it operations and strategic initiatives.
main responsibilities
* advanced system monitoring: implement and maintain advanced monitoring solutions to ensure the health and performance of infrastructure and applications.
* incident response: lead incident response activities, diagnosing and resolving system reliability issues, and conducting post-incident reviews.
* automation and scripting: develop and implement automation scripts and tools to improve system reliability and operational efficiency.
* performance analysis: collect, analyze, and interpret performance data to identify trends, anomalies, and potential issues, providing actionable insights.
* documentation: maintain accurate and up-to-date documentation of system configurations, processes, and procedures.
* collaboration: work closely with other it team members and departments to support reliability engineering projects and initiatives.
* mentorship: provide guidance and support to junior engineers, helping to enhance their technical skills and knowledge.
* security compliance: implement and enforce security measures to protect systems and ensure compliance with security policies.
* continuous improvement: drive continuous improvement initiatives, exploring new technologies and methodologies to enhance system reliability.
* autonomous work culture: actively contribute to creating an autonomous work culture by taking initiative, being self-motivated, and collaborating effectively in an agile and lean environment.
* spin culture ambassador: embody and promote spin's values in every action, fostering a positive and inclusive work environment.
* disaster recovery: develop and maintain disaster recovery plans to ensure business continuity in case of system failures.
required knowledge and experience
* bachelor's degree in computer science, information technology, or a related field, or equivalent work experience.
* minimum of 5+ years of experience in site reliability engineering or related fields.
* strong understanding of system reliability concepts, including monitoring, automation, and incident response.
* proficiency with scripting languages and automation tools.
* strong problem-solving and troubleshooting skills.
* excellent communication and teamwork skills.
* willingness to learn and adapt to new technologies and processes.
* data-driven mindset
* strong communication skills
* english level: intermediate to advanced
spin está comprometida con un lugar de trabajo diverso e inclusivo. Somos un empleador que ofrece igualdad de oportunidades y no discrimina por motivos de raza, origen nacional, género, identidad de género, orientación sexual, discapacidad, edad u otra condición legalmente protegida. Si desea solicitar una adaptación, notifique a su reclutador.
seniority level
* mid-senior level
employment type
* full-time
job function
* engineering and information technology
#j-18808-ljbffr