Operations Engineer (High Availability & Incident Management)
Boost.ai
- Frist 18.02.2026
- Ansettelsesform Fast
Do you love creating reliable, scalable infrastructure and making sure systems run seamlessly every day?
Are you passionate about building robust, scalable infrastructure — and keeping it running with near-zero downtime?
We’re a fast-growing conversational AI company with headquarters in Norway, on a mission to transform how people interact with technology. Our Operations team is expanding, and we’re looking for a high-caliber Operations Engineer to join us in Sandnes, Oslo, or Copenhagen.
This role goes beyond keeping systems running: you will play a key role in delivering a 99.99% availability SLA for our conversational AI platform, ensuring reliability even under high load, rapid change, and real-world complexity.
Why You’ll Love Working Here
You’ll be part of a team that blends software engineering, systems operations, and reliability engineering at the cutting edge of cloud and AI technology. Your work will have a direct impact on how millions of users experience mission-critical conversational services.
We’re building something ambitious, and we’re looking for someone who thrives in an environment where uptime, resilience, and operational excellence truly matter.
What you will be doing
Core Operations & Infrastructure- Design, build, and improve infrastructure and systems using DevOps best practices and Infrastructure as Code.
- Automate and streamline processes to improve deployment speed, reliability, and scalability.
- Collaborate from design to deployment to improve the full lifecycle of services.
- Own and continuously improve service availability, with a clear goal of 99.99% uptime across our conversational AI platform.
- Design and operate systems with fault tolerance, redundancy, graceful degradation, and fast recovery in mind.
- Proactively identify and eliminate single points of failure across infrastructure, application, and operational processes.
- Drive improvements in monitoring, alerting, and observability to detect issues before users are impacted.
- Lead and evolve our incident management process, including:
- Clear on-call structures and escalation paths.
- Well-defined incident severity levels.
- Fast triage, mitigation, and communication during incidents.
- Clear on-call structures and escalation paths.
- Act as a technical lead during major incidents, coordinating response and ensuring rapid restoration of service.
- Run post-incident reviews and blameless retrospectives, turning incidents into concrete reliability improvements.
- Define and track operational metrics such as SLA, SLOs, error budgets, MTTR, and incident frequency.
- Support and troubleshoot customer environments, ensuring stability during upgrades and integrations.
- Work closely with product and engineering teams to ensure operational readiness for new features and releases.
What we are looking for
You’ll thrive in this role if you have:
- BS/MSc in Computer Science or equivalent hands-on experience.
- Strong Linux system administration and optimization skills.
- Experience with programming/scripting languages (e.g. Python, Go, Bash).
- Cloud provider experience (preferably AWS).
- Familiarity with container technologies such as Docker, Kubernetes & Helm.
- Configuration management expertise (Terraform, Ansible, etc.).
- Experience with zero-downtime deployment of web applications.
- Knowledge of CI/CD principles and tools.
- Experience with relational databases (PostgreSQL, MySQL).
- Solid understanding of monitoring, alerting, and observability tools.
- A strong interest in DevOps, reliability engineering, and operational best practices.
Additionally, we believe you will succeed if you
- Are proactive, solution-oriented, and calm under pressure.
- Have a strong sense of ownership — you care deeply about uptime and user impact.
- Are comfortable making decisions during incidents and communicating clearly with stakeholders.
- Are equally comfortable working independently and as part of a collaborative team.
- Able to prioritize effectively in environments where not all problems are equal.
What’s in it for you?
- Impact: Operate AI infrastructure that must be available all the time — and make it better every day.
- Growth: A steep career trajectory with opportunities to shape our reliability and incident management strategy.
- Innovation: Freedom to improve processes, tooling, and architecture in pursuit of world-class availability.
- People: A highly motivated team with a shared goal of operational excellence.
- Environment: A supportive and dynamic workplace culture, both professionally and socially
- Rewards: Competitive salary and exciting benefits.
Sounds good?
Please submit your application using the appropriate form - we’re looking forward to hearing from you and what you can bring to our company!
Please note:- During the recruitment process, we interview the appropriate candidates quickly and continuously - until we find the right candidate. We recommend that you submit your application as soon as possible.
- The position requires being able to work on-premise in Stavanger, Oslo, Norway or Copenhagen, Denmark
About boost.ai
Boost.ai is the trusted leader in AI-powered customer experience solutions for regulated industries. Built for security, speed, and scale, the platform enables fast deployment, high-resolution rates, and full hybrid control through seamless orchestration of traditional NLU and LLMs. With over 600 live virtual agents, and more than 150 million automated conversations, boost.ai helps enterprises around the world resolve with confidence, automate at scale, and trust every conversation.
Proven performance and enterprise-grade reliability make boost.ai the partner of choice for leading brands across the world, including Nordea, Credit Union of Colorado, Sage, DNB, Trading 212, and more. Boost.ai is recognized as a Leader in Gartner’s 2025 Magic QuadrantTM for Conversational AI Platforms. Learn more at boost.ai.
Our core values—trust, innovation, teamwork, and fun—are central to everything we do. Building a supportive environment that fuels our growth, ensuring collaboration and achieving our goals while having a fun and vibrant culture is important to us. These values provide a strong foundation that empowers our team to excel.
Our success is driven by a diverse and dedicated team. We are focused on helping every employee reach their full potential by fostering a culture of trust, responsibility, and equal opportunity for all. It is our policy that all eligible persons shall have equal opportunity for employment and advancement in the company based on their ability, qualifications, and aptitude for the work. We welcome all qualified candidates to apply for this position regardless of gender, gender identity, religious beliefs, sexual orientation, age, or disability.
FerdigheterAI-generert
- Aktiv oppfølging
- Automatisering
- CI/CD (Continuous Integration and Continuous Delivery)
- Erfaring med drift av Linux-servere og infrastruktur
- Infrastructure as code (IaC)
- K8s
- Konfigurasjonsstyring
- Programmeringsspråk
JobbMatch
BetaEr du kvalifisert for jobben?
Nysgjerrig på om du kvalifiserer til denne jobben? Med JobbMatch får du umiddelbar tilbakemelding på hvor godt din profil matcher stillingsutlysningen.
- Sektor: Privat
- Sted: 4313 Sandnes
- Hjemmekontor: Delvis hjemmekontor
- Bransje: IT, IT - programvare
- Stillingsfunksjon: Cloud-utvikler, Database
Nøkkelord
AWS, PostgreSQL
Annonseinformasjon
- FINN-kode 447084866
- Sist endret