Responsibilities
role is responsible for ensuring the overall stability of production application. Reliability, availability, scalability, and efficiency of our production systems and platforms. The operations engineer will collaborate with cross-functional teams—including software engineering, service reliability, infrastructure, and business operations—to streamline processes, manage day-to-day operations, monitor system health, and quickly resolve incidents.
App sustain & ops engineer your scope would consist of:
system reliability & availability
ensure production systems, applications, and infrastructure are reliable, performant, and available within agreed slas/olas.
Incident & problem management
lead troubleshooting of critical incidents and drive timely resolution as part of incident management. Ensure the root cause analysis is performed and help coordinate the implement permanent fixes on a timely basis.
Analyze priority incidents to generate insights and identify gaps in the alerting mechanisms.
Analyze market-specific issues and conduct comparative studies to determine why certain problems occur only in specific markets.
Monitoring & alerting
partner with the service reliability engineering team to identify, develop and maintain proactive monitoring, alerting, and health checks to detect and prevent issues before business impact.
Assist the sre team in identifying critical health checks for order flow, order journey and user journeys to enable dedicated notifications for key steps.
Deployment & change operations
partner with the software engineering team to support safe, efficient deployments and configuration changes, ensuring minimal disruption to business operations.
Provide insights on system performance and capacity trends; provide recommendations to the software engineering to implement improvements for scalability and efficiency.
Automation & continuous improvement
identify manual operational tasks and automate processes to increase efficiency, reduce errors, and improve response times.
Identify recurring data anomalies through analysis and assist in determining effective technical and process-related solutions.
Review l2 team’s manual processes to uncover automation opportunities and implement technology-specific solutions aimed at improving productivity.
Collaboration with engineering & product teams
partner with development, infrastructure, and reliability engineering teams to design and deliver operable, scalable, and resilient solutions.
Operational excellence & documentation
maintain runbooks, sops, and technical documentation; uphold it controls, compliance, and audit readiness.
Risk & security management
enforce operational security best practices, support vulnerability remediation, and contribute to disaster recovery and business continuity planning.
Qualifications
bachelor’s degree in computer science, information technology, engineering, or a related field (or equivalent experience).
5+ years of experience in operations engineering, site reliability engineering, or systems administration.
Fluent in english and spanish.
Strong knowledge of linux/unix and/or windows server environments.
Experience with monitoring and alerting tools (e.g., prometheus, grafana, datadog, splunk, nagios, appdynamics, full story, ignio).
Proficiency in at least one scripting/programming language (e.g., python, bash, powershell).
Familiarity with ci/cd pipelines, deployment automation, and configuration management (e.g., jenkins, ansible, puppet, chef).
Database experience: mysql, mongodb, cassandra, couchbase.
Understanding of networking fundamentals (dns, tcp/ip, load balancing, firewalls).
Hands‑on experience with cloud platforms (aws, azure, gcp).
Experience working with service now.
Benefits
opportunities to learn and develop every day through a wide range of programs.
Internal digital platforms that promote self‑learning.
Development programs according to leadership skills.
Specialized training according to the role.
Learning experiences with internal and external providers.
We love to celebrate success, which is why we have recognition programs for seniority, behavior, leadership, moments of life, among others.
Financial wellness programs that will help you reach your goals in all stages of life.
A flexibility program that will allow you to balance your personal and work life, adapting your working day to your lifestyle.
Family benefits such as our wellness line, thousands of agreements and discounts, scholarship programs for your children, aid plans for different moments of life, among others.
We are an equal opportunity employer and value diversity at our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We respect and value diversity as a work force and innovation for the organization.
#j-18808-ljbffr