L3 support engineer

Lagos de Moreno, Jal

Bonbloc

Publicada el 15 febrero

Descripción

Role mission the primary mission of this role is to ensure the high availability and stability of enterprise applications running on ibm websphere and red hat openshift.
this is a support-centric position focused on proactive monitoring, rapid incident resolution (l3), and the continuous optimization of production environments to meet strict service level agreements (slas).
key responsibilities
incident & problem management (l3): act as the final point of technical escalation for complex outages involving websphere application server (was) and openshift clusters.
production stability: monitor environment health 24/7 using enterprise observability tools and execute immediate recovery actions during critical failures.
automation & scripting: develop and maintain bash and python scripts to automate repetitive support tasks, log collection, and automated health checks across the platform.
root cause analysis (rca): lead deep-dive investigations into jvm memory leaks, thread contention, and pod crashes to provide permanent fixes.
patching & lifecycle: execute platform upgrades, security patching, and configuration synchronization for both was (base/nd) and openshift environments.
observability: configure and maintain dashboards (grafana, prometheus, or apm tools) to track cluster performance and application health.
on-call rotation: participate in technical coverage for business-critical applications during high-priority incidents.
technical stack & requirements
must-have (technical core):
container support: 3+ years troubleshooting red hat openshift (v4.x), including sdn, ingress/routes, and persistent volumes.
middleware administration: expert knowledge of ibm websphere (base, nd, liberty), including profile management, ssl/tls certificates, and ihs.
advanced scripting: proven ability to create production-grade scripts in bash or python to interface with the openshift api (oc cli) and automate middleware tasks.
linux systems: deep knowledge of rhel (red hat enterprise linux) kernel parameters, networking diagnostics (tcpdump), and system performance tools.
observability tools: experience with monitoring stacks such as elk, dynatrace, appdynamics, or datadog.
nice-to-have (added value):
experience with ansible for configuration management and automated patching.
familiarity with itil frameworks (incident, problem, and change management).
knowledge of f5 big-ip or similar enterprise load balancers.
soft skills & competencies
sense of urgency: ability to remain effective and lead "war rooms" during high-priority production outages.
analytical thinking: methodical approach to isolating issues within complex, multi-layered architectures.
technical communication: capacity to translate complex infrastructure events into clear status updates for management.

Aplicar

Crear una alerta

Guardar