Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, ai, engineering innovation, and iot. Our customers include the world's leading public cloud and silicon providers, and industry leaders across various sectors. The company is a pioneer of global distributed collaboration, with 1200+ colleagues in 75+ countries and very few office-based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution.
the company is founder-led, profitable, and growing.
we are hiring a senior site reliability / gitops engineer for our information systems (is) team. This role offers an opportunity for an "automation-first" senior technologist with a passion for linux to build a career with canonical and contribute to the success of those leveraging ubuntu and open source products. If you have experience in it operations automation, infrastructure as code, and a passion for technology, you will enjoy working with some of the best minds in the industry at canonical.
job summary
the is team at canonical supports and maintains all of canonical's it production services, which are used by over 60 million ubuntu users.
as a senior sre & gitops engineer, you'll drive operations automation to the next level, both in our private clouds and in public clouds, utilizing open source infrastructure as code software, ci/cd practices, and canonical’s leading automation products.
you will also provide feedback to developers on product operation at scale, contribute to bug reports and pull requests, and collaborate on design and implementation within the company.
you will be part of a global team of sres supporting services for our company, customers, and the ubuntu community.
responsibilities
1. lead automation and gitops development as an embedded tech lead.
2. collaborate with is architects to align solutions with the overall architecture vision.
3. design and architect services as products for internal use.
4. develop and enhance infrastructure as code practices, increasing automation and process efficiency.
5. automate software operations for reusability and consistency across clouds, considering distributed system complexities.
6. maintain operational responsibility for core services, networks, and infrastructure.
7. develop troubleshooting, capacity planning, and performance investigation skills, utilizing observability tools like prometheus, grafana, and elasticsearch.
8. assist and collaborate with globally distributed engineering, operations, and support teams.
9. dedicate time for larger projects and automation initiatives.
10. share expertise through design sessions, mentorship, and collaborative work.
11. handle time-critical escalations responsibly.
candidate profile
* modern hosting architecture, driven by infrastructure as code across private and public clouds.
* product mindset focused on developing products rather than just solutions.
* extensive python development experience, especially with large projects.
* experience with kubernetes or similar container orchestration systems.
* proven ability to manage and deploy cloud infrastructure via code.
* practical linux networking, routing, and firewall knowledge.
* familiarity with linux storage solutions like ceph or databases.
* hands-on experience administering enterprise linux servers.
* deep understanding of cloud computing concepts and technologies.
* bachelor's degree or higher in computer science or related field.
* effective communication skills in english, both written and verbal.
* motivated troubleshooting skills from kernel to web layers.
* flexibility, quick learner, and adaptability to fast-changing environments.
* comfort working within distributed teams.
* passion for open-source, especially ubuntu or debian.
#j-18808-ljbffr