Job description agileengine is an inc. * company that creates award-winning software for fortune 500 brands and trailblazing startups across 17+ industries.
we rank among the leaders in areas like application development and ai/ml, and our people-first culture has earned us multiple best place to work awards.
why join us if you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
about the role we are looking for a middle sre operations engineer to maintain reliability across a cloud-based saas platform.
you'll handle live incidents, improve observability, and reduce toil through automation using kubernetes, terraform, grafana, and aws.
hands-on, execution-focused, with real ownership across ci/cd pipelines, gitops workflows, and on-call rotations.
what you will do - monitor and support production and staging environments to ensure availability, performance, and stability; - respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts; - participate in on-call rotations with defined slas; - handle operational requests from internal teams; - maintain and improve monitoring, alerting, dashboards, logs, and metrics; - support ci/cd pipelines, production releases, and gitops workflows; - contribute to automation initiatives to reduce operational overhead; - maintain and improve kubernetes-based infrastructure and containerized workloads; - support infrastructure as code practices and environment improvements.
must haves - 2+ years of experience in site reliability engineering, devops, or production operations; - experience with aws supporting production environments; - experience supporting production saas applications ; - strong understanding of ci/cd systems (github actions, jenkins, circleci); - experience with gitops and git fundamentals ; - experience using github, jira, and confluence ; - experience with kubernetes (eks, kops or similar); - experience with docker and containerization ; - experience with observability tools (grafana, prometheus, loki, pagerduty); - proficiency in scripting (bash, python, or go); - experience with infrastructure as code (terraform, helm); - ability to work within structured operational processes and slas; - strong written and verbal english communication skills; - self-driven with a growth mindset.
nice to haves - aws certifications such as solutions architect, devops engineer, or sysops administrator; - experience with multi-tenant saas environments; - experience working in globally distributed teams; - familiarity with chatops practices; - experience improving monitoring quality and reducing alert fatigue.
perks and benefits - professional growth: mentorship, techtalks, and personalized growth roadmaps.
- competitive compensation: usd-based pay with education, fitness, and team activity budgets.
- exciting projects: modern solutions with fortune 500 and top product companies.
- flextime: flexible schedule with remote and office options.
meet our recruitment process it includes main stages: application ? Coding challenge ? Video interview ? Technical interview or interview with the hiring manager(s).
each step helps us understand your skills and overall fit.
if it's a match, you'll receive an offer.
requirements - 2–3+ years of experience in growth or marketing performance roles — ideally in the b2b saas or tech services space.
- understanding of the ai market landscape, with the ability to promote ai development services and translate technical capabilities into marketing messages.
- proven experience running successful marketing campaigns and managing performance metrics.
- familiarity with crm and sales engagement platforms (e.g. Instantly, reply.io, apollo).
- strong analytical skills with the ability to interpret data and generate actionable insights.
- excellent organizational skills and attention to detail.
- strong written and verbal communication skills in english.