Responsibilities
Responsibilities
Billions of users across the world rely on our products, and to meet this demand we design and implement world-class distributed systems.
As a Software Engineer in one of our Azure SRE teams, you will be responsible for improving the reliability of key Azure products.
The Azure SRE key focus areas are:
- Building reusable automation and processes that help multiple teams meet their reliability goals. Influencing product architecture and roadmaps to ensure customer-experienced reliability is a core design principle.
- Contributing directly to product code to achieve reliability outcomes. Leveraging AI to proactively detect anomalies, predict incidents, and automate operational workflows - scaling reliability efforts across complex systems.
We are looking for engineers passionate about the above areas who are also interested in:
- Providing technical leadership across multiple Azure teams. Mentoring others on SRE principles, practices, and tools as well as AI usage to boost software development productivity.
- Designing and developing large-scale distributed software services and solutions. Delivering “best-in-class” engineering by ensuring services are modular, secure, reliable, testable, diagnosable, observable, and reusable.
- Collaborating with internal and external partners to support team goals. Balancing pragmatism with vision—driving continuous improvements in process and codebase. Building automation to prevent or remediate service issues before they impact users.
- Driving innovation in large-scale operations by applying cutting-edge AI tools and techniques to reduce operational toil and scale reliability engineering across complex systems. Gaining a working understanding of Microsoft businesses and contributing to cohesive, end-to-end user experiences.
Qualifications
Required Qualifications
- Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR Master's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
- OR equivalent experience working with large-scale distributed systems (e.g., cloud computing providers, SaaS services, etc., ideally with millions or billions of users) or similarly complex environments.
- Awareness of, and ability to reason about, modern distributed software design patterns and cloud systems architecture, including microservices, containers, load-balancing, queuing, caching.
- Experience with C#/Java/C/C++/Golang.
- Experience in building, shipping and operating reliable solutions.
- Operated large-scale distributed systems with high availability requirements
- Built or evolved monitoring and alerting systems that drive actionable insights, not alert fatigue
- Delivered cross-team influence, mentoring engineers, and shaping engineering culture
- Navigated production incidents and led postmortems that changed operational practices
- Partnered closely with product, data science, and AI teams to ship reliable, intelligent services
- Cloud Infrastructure (Azure, AWS, GCP) — design, provisioning, scaling, cost optimization
- Kubernetes & Container Orchestration — advanced deployment patterns, service mesh, troubleshooting
- CI/CD Engineering — implementing resilient pipelines, artifact management, automated testing frameworks
- Familiarity with modern distributed software design patterns and cloud systems architecture, including microservices, containers, load balancing, queuing, caching.
- Experience as a technical lead or engineering manager.
- Experience working on large and unfamiliar codebases (millions of lines of code).
- Experience with open-source projects, Kubernetes, Linux and containers is desired.
- Proven track record in building, shipping, and operating reliable solutions.
- Proficiency in programming languages like C#/Java/Python.
- Experience with data technologies (SQL/NoSQL/etc.).
- Experience with Azure is a plus.
- Experience in AI adoption with tools like GitHub Copilot, Azure OpenAI and custom copilots to streamline development and reduce toil.
- Systems thinking: sees how architecture, process, and people interplay
- Comfort with ambiguity and the ability to set technical direction under uncertainty
- Mentorship: coaching senior engineers to raise the team’s bar
- Strong communication to align stakeholders at multiple levels
Other Qualifications
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
#AZCXP #AzRelJobs
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
your
mark