Full-time

Site Reliability Engineer (Toronto)

Posted on 01 May 25 by Shina Sharma

  • Toronto, Ontario
  • $ - $
Logo

Powered by Tracker

Job Description

We are seeking a proactive Site Reliability Engineer (SRE) to drive reliability, performance, and efficiency across our systems and platforms. You'll work closely with Application Development, QA, Product, and Data Engineering teams to champion a DevOps/SRE culture rooted in automation, observability, and continuous improvement.


Key Responsibilities:

  • Collaborate cross-functionally to promote SRE and DevSecOps best practices across the organization.
  • Build and maintain reliable, scalable systems with a focus on availability, performance, and resiliency.
  • Establish and monitor SLOs/SLIs, and develop comprehensive dashboards to support decision-making from both technical and business perspectives.
  • Lead efforts to reduce toil through automation, self-healing systems, and advanced monitoring (e.g., synthetic monitoring, RUM).
  • Apply observability and reliability testing practices from architecture through operations, leveraging Agile and product-based models.
  • Drive the adoption of cutting-edge tools in observability, automation, platform engineering, AIOps, and MLOps.
  • Contribute to and lead Communities of Practice (CoP) and SRE Office Hours to foster knowledge sharing and continuous improvement.

Qualifications:

SRE & DevOps Expertise:

  • Strong experience in observability, toil reduction, incident response, and performance optimization.
  • Proficient with monitoring tools such as Dynatrace, CloudWatch, and Azure Monitor.
  • Skilled in IaC, CaC, JSON, and scripting with Python, Node.js, Ruby, PowerShell, and Shell.
  • Deep understanding of Dynatrace advanced features: DT Guardian, RUM, Synthetic Monitoring, AI-based event correlation.

Cloud & Automation:

  • Expert in AWS Cloud services: CDK, Lambda, CloudWatch, EKS, EC2, ELB, S3, SSM.
  • Experience with log ingestion pipelines (AWS Firehose, Dynatrace OpenPipeline), and operational dashboards.
  • Hands-on experience with Ansible Tower, AWS SSM, Bitbucket/GitHub, and CI/CD workflows.

Orchestration & Data:

  • Familiarity with orchestration tools like Step Functions, Apache Airflow, and container platforms.
  • Knowledge of data pipelines, data lakes, and databases (Redshift, RDS, Aurora, PostgreSQL, SQL Server, Oracle).

Leadership & Communication:

  • Strong problem-solving and knowledge management skills.
  • Effective communicator who bridges technical and business teams.
  • Collaborative, inclusive leader who builds high-performing teams and fosters a culture of growth and recognition.

Job Information

Rate / Salary

$ - $

Sector

IT Managed Services

Category

it

Skills / Experience

it

Benefits

Not Specified

Our Reference

JOB-21889

Job Location