Apply now »

Senior Site Reliability Engineer

Date:  25 May 2026

Senior Site Reliability Engineer

Company:  IT & Digital Solutions

Job Purpose

Responsible for ensuring the stability, reliability, availability, and performance of airline reservation and related operational systems. The role supports production environments through proactive monitoring, troubleshooting, root cause analysis, defect resolution, and continuous improvement of system reliability and operational efficiency. The role also contributes to automation initiatives, CI/CD processes, observability enhancements, and containerized infrastructure management in alignment with business continuity and operational excellence objectives.

Key Result Responsibilities

  • Troubleshoot and resolve complex production incidents and system performance issues across airline reservation and associated enterprise applications.
  • Perform detailed root cause analysis (RCA) for application failures, outages, and recurring incidents; ensure corrective and preventive actions are identified and implemented.
  • Analyze defects and either implement fixes directly or provide detailed technical recommendations and inputs to development teams for permanent resolution.
  • Monitor application health, system availability, and operational metrics to ensure service reliability and uptime targets are consistently achieved.
  • Improve observability through enhanced monitoring, alerting, logging, and dashboarding solutions using industry-standard tools and practices.
  • Support and maintain CI/CD pipelines, deployment automation, and release management activities to ensure smooth and reliable software delivery.
  • Work closely with development, infrastructure, DevOps, database, and support teams to improve application resiliency, scalability, and operational efficiency.
  • Support and manage containerized application environments using Docker and Kubernetes.

Key Result Responsibilities-Continued

  • Participate in incident management, on-call support, and problem management activities as required to ensure timely resolution of critical issues.
  • Review application logs, database performance, and system integrations to proactively identify reliability risks and operational bottlenecks.
  • Contribute to automation initiatives, operational runbooks, standard operating procedures, and technical documentation.
  • Ensure compliance with organizational IT governance, cybersecurity, change management, and operational standards.
  • Continuously identify opportunities to optimize application performance, infrastructure utilization, deployment processes, and operational workflows.

Qualifications (Academic, training, languages)

  • Bachelor’s Degree in Computer Science, Software Engineering, Information Technology, or equivalent discipline.
  • Fluent in English Language
  • Strong hands-on expertise in Java and Spring Boot frameworks.
  • Experience working with both microservices architecture and monolithic applications.
  • Strong troubleshooting, debugging, and analytical problem-solving capabilities.
  • Basic scripting and automation knowledge; Python experience preferred.
  • Proficient in MS Office.
  • Strong SQL knowledge with experience in database troubleshooting and performance analysis; Oracle Database experience is an advantage.
  • Familiarity with monitoring, observability, and logging tools such as Prometheus, Grafana, Elasticsearch, and Datadog.

Work Experience

  • 4–7 years of experience supporting Java-based enterprise applications in production environments.
  • Experience with CI/CD tools and deployment pipelines such as Jenkins and GitOps practices.
  • Hands-on experience with Docker and Kubernetes in enterprise production environments is mandatory.
  • Experience with JBoss application server is considered an advantage.
    Experience in airline, travel technology, reservation systems, or high-availability enterprise environments is preferred.

Apply now »