Job Description
One of our clients is seeking a seasoned Principal Data Engineer to architect, build, and optimize scalable, near real-time data pipelines using Azure Databricks and Azure Data Lake Storage (ADLS). You will play a key leadership role in delivering a robust data platform that powers analytics, reporting, and data science across the enterprise.
Key Responsibilities:
- Lead the end-to-end architecture and implementation of a modern data platform on Azure using Databricks, ADLS, and Delta Lake
- Design and deploy incremental data ingestion pipelines leveraging Azure Data Factory (ADF), Change Data Capture (CDC), and Databricks Auto Loader
- Develop and optimize PySpark workflows for both batch and streaming data processing
- Manage Delta Lake architecture with Bronze, Silver, and Gold layers to ensure data quality and consistency
- Conduct performance tuning and cost optimization for Databricks and Azure SQL workloads
- Implement data security, access controls, and governance policies to maintain compliance and integrity
- Collaborate with data scientists, analysts, and business teams to align data architecture with business goals
- Provide technical leadership and mentorship to data engineering teams; establish coding standards and best practices
Required Qualifications:
- 10+ years of experience in data engineering, with strong hands-on expertise in the Azure cloud ecosystem
- Proficiency in Databricks, PySpark, Delta Lake, and Azure Data Factory
- Deep knowledge of Azure SQL, CDC mechanisms, and data pipeline design
- Expertise in both batch and real-time data processing architectures
- Advanced SQL skills, with experience in query performance tuning and optimization
- Solid understanding of data governance, cloud security, and compliance frameworks
- Proven experience in leading teams in an Agile environment
Preferred Qualifications:
- Experience with Terraform for infrastructure as code (nice to have)
- Certifications in Azure Data Engineering, Databricks, or Terraform
- Background working in large-scale enterprise data environments
- Familiarity with Power BI integration with Databricks and ADLS