Job Description
Our client is seeking a highly motivated and self-driven Site Reliability Engineer (SRE).
Requirements:
- Exchange On-Line, Archiving Technologies - Barracuda or Veritas, Network File Shares, Backups, Patching, M365 Stack, Cloud - Azure and AWS, Windows Server OS and Windows Server T-Shoot
- Modern Email Infrastructure such as M365, Sendgrid etc.
- Microsoft 365 – Teams including Channels, OneDrive
- Full life Cycle Testing and deployment of MS-released patching – all environments.
- Responsible for supporting EDR for enterprise Storage and cloud repositories.
- Microsoft – Windows Servers Eco-System
- Microsoft Entra ID user and access management and third-party integrations.
- Backup solutions (Rubrik On-Prem and RSC, Veritas)
- Azure MFA/SSO/SSPR
- PING FED and DIR - SSO and SSPR
- Monitor and maintain server, storage, and network systems to ensure high availability and performance.
- Manage incident, problem, and change management processes for infrastructure components.
- Perform root cause analysis (RCA) on recurring or high-impact infrastructure incidents.
- Execute infrastructure health checks, capacity planning, and performance tuning.
- Maintain system documentation, operational procedures, and configuration records.
- Respond to monitoring alerts and perform first- and second-level troubleshooting.
- Collaborate with other IT teams (e.g., application support, security, DevOps) to resolve cross-functional issues.
- Participate in on-call rotation and provide after-hours support as needed.
- Identify and implement automation opportunities to improve operational efficiency.
- Excellent communication and documentation skills
- Strong sense of ownership and operational awareness
- Able to work cross-functionally across dev, infrastructure, and legacy support teams
- Comfortable maintaining reliability for systems you didn’t originally build