Overview
agileengine is an inc. 5000 company that creates award-winning software for fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and ai/ml, and our people-first culture has earned us multiple best place to work awards.
if you\'re looking for a place to grow, make an impact, and work with people who care, we\'d love to meet you!
responsibilities
* shift: monday – thursday 8am – 7pm pst (11am – 10pm est) with rotating on-call.
* on call shifts: every 6 weeks, for one week as primary responder and next week as secondary.
* manage alerts daily, check systems, and escalate issues as needed.
* be part of a team that provides 24×7 on-call support for critical saas events.
* be available in case of emergencies when team members are not available or need help.
* document issues and remediation steps.
* proactively create appropriate monitors in the eks/k8s ecosystem.
* deploy to eks/k8s cluster using terraform and helm.
* learn and maintain existing infrastructure running under docker swarm.
* improve existing infrastructure health by implementing checks and scripts to correct known issues.
* maintain and develop deployment code.
* automate manual tasks.
* implement/integrate new technologies in our cloud infrastructure.
* collaborate with other teams and departments to provide the highest level of support and assistance.
* apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes.
* work closely with support, customer success, migration, and professional services teams to provide the best in class saas service to our customers.
* perform rca and take necessary corrective actions to prevent recurrence of issues.
* create and assign alert-related actions to the appropriate team after the investigation.
* handle support requests for environment-specific actions.
* identify and provide automation requirements to improve rca.
must haves
* 2+ years of professional experience.
* experience working with datadog.
* hands-on experience as an aws cloud engineer.
* working knowledge of eks/terraform/helm.
* working experience with docker and docker swarm.
* good understanding of aws iam roles and policies.
* experience logging and monitoring aws resources using cloudwatch logs.
* experience working in a linux environment.
* proficient in bash and/or python scripting.
* a strong understanding of web technologies such as rest apis.
* working experience with monitoring solutions, such as grafana and prometheus.
* excellent oral and written communication skills; customer-facing communication skills to explain issues and rcas.
* experience in product/application support for saas-based products.
* understanding of apis, databases, systems architecture, and design.
* designing, implementing, and operating in a devsecops environment.
* excellent communication skills, both written and verbal.
* ability to work independently as well as within a collaborative environment.
* a technical aptitude with the desire to learn new and evolving technologies.
* upper-intermediate english level.
nice to have
* experience with gcp or azure.
* certifications: aws certified devops engineer – professional or aws certified advanced networking specialty.
perks and benefits
* professional growth: accelerate your professional journey with mentorship, techtalks, and personalized growth roadmaps.
* competitive compensation: we match your ever-growing skills, talent, and contributions with competitive usd-based compensation and budgets for education, fitness, and team activities.
* a selection of exciting projects: join projects with modern solutions development and top-tier clients that include fortune 500 enterprises and leading product brands.
* flextime: tailor your schedule for an optimal work-life balance, with options for working from home or going to the office—whatever makes you happiest and most productive.
#j-18808-ljbffr