Site reliability engineer (sre)
responsibilities
* design, implement, and maintain scalable and highly available infrastructures.
* monitor and ensure the performance and reliability of production systems.
* implement automation for recurring tasks and operational processes.
* collaborate with development teams to improve continuous delivery and codedeployment.
* respond to incidents and conduct post-mortem analysis to prevent future issues.
* optimize resource usage and manage system capacity.
requirements
* experience in a similar role.
* knowledge of unix/linux operating systems.
* experience with monitoring and log management tools (prometheus, grafana, splunk, elk stack).
* scripting and automation skills (python, bash, go, shell).
* experience with cloud platforms (aws, gcp, azure).
* knowledge of containers and orchestration (docker, kubernetes).
* familiarity with ci/cd tools (jenkins, gitlab ci/cd, circleci).
* experience in configuration management (ansible, puppet, chef).
* knowledge of sql and nosql databases (mysql, postgresql, mongodb).
* experience with cloud storage (s3, google cloud storage).
* familiarity with security tools (vault, ossec, or any siem).
* experience with infrastructure as code (terraform, cloudformation).
* knowledge of networking and load balancing (nginx, haproxy, f5).
* experience with messaging and data flow systems (apache kafka).
* problem-solving skills and ability to work under pressure.
* excellent communication and teamwork skills.
* good written and oral english language skills.
benefits
* a dynamic and collaborative work environment.
* opportunities for professional growth and development.
* flexible work arrangements and remote work possibilities.
* competitive salary.
#j-18808-ljbffr