Principal service reliability applications developer
join to apply for the principal service reliability applications developer role at oracle.
3 weeks ago be among the first 25 applicants.
own and scale mission‑critical erp/saas services while building intelligent, cloud‑native capabilities. This role requires a sre mindset combined with ai/ml expertise and strong application engineering skills across public and private cloud environments.
responsibilities
* end‑to‑end service ownership: design for telemetry, security, resiliency, scalability, and performance; lead sizing/architecture; drive service health reviews and process simplification.
* incident management and prevention: lead postmortems/rcas, coordinate fixes, define repair items, and implement data‑driven prevention and continuous improvement.
* ai/ml and genai delivery: design and integrate solutions with llms, rag, agentic workflows, and conversational ai; build low‑latency model serving and retraining pipelines.
* automation: eliminate toil by automating operational workflows, recovery procedures, code delivery, and configuration management; build internal tools and reusable scripts/services to accelerate delivery and reduce errors.
* observability: define and implement monitoring, logging, alerting, and tracing strategies; establish slos/slis/error budgets; improve diagnostics and performance visibility for rapid triage.
* cross‑functional collaboration: partner with product, operations, and data teams to translate requirements into secure, scalable solutions; communicate effectively with technical and non‑technical stakeholders.
qualifications
* bs/ms in computer science or related field; 10+ years of software engineering in cloud environments.
* strong in distributed systems/microservices using java / python; sql/data modeling; python for ai/automation.
* sre/devops expertise: systems and networking fundamentals, application security, observability, performance analysis, and incident response.
* proven sdlc excellence: code quality, reviews, version control, ci/cd, testing, and release engineering.
preferred/technical skills
* ai/ml/genai: experience with foundational models, rag, agentic architectures; model deployment, optimization, monitoring, and retraining.
* cloud and containers: experience with containerization, orchestration, and resilient, fault‑tolerant microservices.
* observability: hands‑on experience designing dashboards, alerts, traces, logs, and metrics; defining slos/slis and error budgets; on‑call readiness and runbook quality.
* operations: performance tuning across java / python and sql for large‑scale enterprise applications; strong linux/unix expertise; capacity planning and reliability reviews.
* automation and scripting: proficiency in scripting to automate operational workflows, build tooling, and ci/cd tasks (e.g., shell scripting, python, configuration‑as‑code, task runners).
* familiarity with enterprise erp applications and standard devops tooling and practices.
seniority level
mid‑senior level
employment type
full-time
job function
engineering and finance
industries
it services and it consulting, financial services, and software development
#j-18808-ljbffr