We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Service Engineer

Microsoft
United States, Washington, Redmond
Oct 31, 2025
OverviewAre you passionate about cloud computing, driven by customer experience, and guided by an AI-oriented, data-driven mindset? Do you thrive in high-stakes, live-site environments and want to play a pivotal role in ensuring the reliability of Microsoft's cloud platform? If so, the Azure Customer Experience (CXP) team is looking for you. Microsoft Azure is one of the most exciting and strategic products at Microsoft-powering mission-critical workloads for enterprises, governments, and startups around the world. Azure delivers on-demand, hyper-scale infrastructure and platforms via Microsoft's global data centers, enabling customers to build, host, and scale their applications with confidence. The Customer Reliability Engineering (CRE) team within Azure CXP is a top-level pillar of Azure Engineering responsible for world-class live-site management, customer reliability engagements, modern customer-first experiences for scale, and drives deep customer insights and empathy into the broader Azure Engineering organization.Our "no dead-end's" philosophy ensures that every customer, regardless of size or scale, can realize their full potential through the Microsoft Cloud We're looking for a Service Engineer who blends operational rigor with analytical insight to strengthen the reliability and performance of Microsoft services. This role focuses on turning service telemetry, incident data, and system signals into operational clarity and action. You'll drive improvements across service health, incident management, and automation, partnering closely with engineering and operations to make data-informed decisions that scale reliability across complex systems. The ideal candidate brings a balance of technical curiosity, operational discipline, and collaboration skills. You work well under pressure, communicate clearly, coordinating seamlessly with internal stakeholders, and can translate patterns in data into tangible engineering actions. Every day, our customers stake their business and reputation on cloud. You can help #AzCXP provide our customers with the world-class cloud services they need to succeed.#azcre
ResponsibilitiesUse metrics to assess operational effectiveness, platform health, and the impact of reliability improvements. Analyze customer-impacting signals from telemetry, support cases, and feedback to identify root causes, lead incident reviews (RCAs/PIRs), and drive preventative improvements. Analyze incident data to identify recurring themes, patterns, and systemic issues impacting customers. Leverage telemetry and customer feedback to generate actionable insights and identify opportunities for proactive service enhancements. Develop and promote operational playbooks aligned with incident response and customer-impact scenarios. Build and maintain reliable, data-driven dashboards and reports to monitor key performance indicators, system health, and incident trends across Azure services. Partner with Azure service teams to share insights, coordinate incident response actions, and ensure customer and operational needs are met. Apply data analytics to detect anomalies, uncover root causes, and drive improvements in service reliability and performance. Drive continuous improvement by integrating learnings from live site events and customer feedback into service design, monitoring, and incident management frameworks. Bring an engineering mindset to data operations-balancing agility, scalability, and technical excellence to solve operational challenges. Exhibit strong cross-team collaboration, engineering mindset, and results-oriented execution under pressure
Applied = 0

(web-675dddd98f-rz56g)