(goc) engineer – global operations center
schedule: 9:00 a.m. – 6:00 p.m.
operating model: 24x7 operational support (on-call availability required for critical incidents outside standard hours).
role overview
the goc engineer role is a critical position within the global operations center, responsible for ensuring the availability, stability, and observability of production platforms and services in a mission-critical environment.
this position operates as first- and second-line technical support (l1/l2) for incident detection, analysis, initial containment, and structured escalation. The role requires timely and well-documented responses aligned with defined slas and slos, as well as strong technical judgment and the ability to collaborate effectively with engineering, sre, security, and cloud teams.
key responsibilities
proactively monitor infrastructure, applications, and security events using observability platforms such as datadog, splunk, prometheus, or equivalent tools.
perform advanced technical triage of alerts and incidents, assessing impact, severity, and potential root causes.
resolve first-level incidents and execute l2 containment actions by following established sops, runbooks, and operational procedures.
escalate complex or high-severity incidents to the appropriate engineering teams, providing complete technical context, evidence, and clear timelines.
conduct daily system health checks, post-deployment validations, and operational support during scheduled change windows.
accurately log all actions, findings, and decisions in ticketing and incident management systems.
actively participate in shift handovers, ensuring operational continuity and clear communication between teams.
identify recurring alert and incident patterns and propose improvements to monitoring configurations, alert thresholds, and documentation.
contribute to the continuous improvement of runbooks, sops, and goc operational processes.
required technical qualifications
3–5 years of experience in noc, goc, it operations, production support, or 24x7 operational environments.
hands-on experience with monitoring and observability tools such as datadog, splunk, prometheus, grafana, or similar platforms.
strong working knowledge of linux (command-line operations, log analysis, process management, resource utilization, and services).
functional understanding of cloud infrastructure (aws and/or gcp), including compute, basic networking, storage, and high-availability concepts.
experience operating under slas, incident severity models, and structured escalation processes.
availability to work within a rotational shift model and provide 24x7 coverage for critical incidents.
preferred qualifications
experience with itsm platforms such as servicenow.
exposure to sre, devops, or saas production support environments.
basic knowledge of operational security and event management.
ability to analyze performance, capacity, and availability metrics.
experience collaborating with global, distributed teams.
core competencies
strong sense of operational ownership and accountability.
analytical mindset with a structured approach to problem resolution.
solid technical documentation and written communication skills.
ability to prioritize incidents and make sound decisions under pressure.
continuous improvement mindset with strong process orientation.
what we offer
competitive benefits package above market standards.