Key responsibilities:
- design, implement, and maintain kubernetes clusters and associated infrastructure.
- handle and monitor linux servers and aws resources.
- develop automation scripts and tools using python to streamline operational tasks.
- implement and maintain observability tools and practices to ensure system reliability and performance.
- key requirements:
- proven experience on linux system administration and aws cloud services.
- proficiency in python scripting for automation and tool development.
- experience with ci/cd tools such as github actions and jenkins.
- knowledge of observability stacks such as prometheus, grafana, and elk stack.
- strong problem-solving skills and the ability to troubleshoot complex issues in a distributed environment.