Negotiable
Undetermined
Remote
Remote
Summary: The role of Databricks Data Engineer focuses on designing, building, and optimizing scalable data pipelines and lakehouse solutions using Databricks and AWS services. The engineer will implement robust batch and streaming data solutions while ensuring high performance, scalability, and security. Collaboration with BI teams and business stakeholders is essential to support analytics and AI/ML data requirements. Strong hands-on experience with Databricks and AWS is required for this position.
Key Responsibilities:
- Build and maintain end-to-end data pipelines using Databricks, Delta Lake, and AWS services
- Develop batch, real-time, and streaming data processing workflows
- Implement data ingestion, transformation, curation, and storage pipelines
- Build and optimize large-scale PySpark and SQL-based jobs in Databricks
- Enable real-time data processing using Kafka, AWS Kinesis, or similar streaming tools
- Work on Databricks-based lakehouse architecture using Delta Lake
- Implement scalable and optimized data storage and processing frameworks
- Ensure data quality, consistency, and reliability across pipelines
- Support metadata management, data lineage, and governance implementation
- Work with AWS services such as S3, Glue, Lambda, Kinesis, and Redshift
- Ensure pipelines are scalable, secure, and cost-optimized in AWS environments
- Implement security controls including RBAC, encryption, and data masking
- Tune Spark jobs for performance and cost efficiency
- Monitor and troubleshoot data pipeline issues in production
- Follow CI/CD and DevOps practices for deploying data engineering solutions
- Ensure adherence to data engineering standards and best practices
- Work closely with BI teams, and business stakeholders
- Support analytics and AI/ML data requirements through curated datasets
- Collaborate with architects to ensure alignment with AWS-based data strategy
Key Skills:
- Strong hands-on experience with Databricks
- Proficiency in Python, PySpark, and SQL
- Strong experience in AWS cloud services (S3, Glue, Lambda, Kinesis, Redshift)
- Experience building ETL/ELT data pipelines
- Strong understanding of Delta Lake and lakehouse concepts
- Experience with streaming and batch data processing
- Knowledge of CI/CD tools and Git
- Strong troubleshooting and performance tuning skills
- IaC (Terraform/CloudFormation)
- Data quality & observability frameworks
- Deeper Databricks-specific features (DLT, Unity Catalog, Workflows)
- Security & compliance depth
- DevOps tooling specifics
- Leadership/communication expectations
Salary (Rate): undetermined
City: undetermined
Country: undetermined
Working Arrangements: remote
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
We are looking for a hands-on Databricks Data Engineer with strong AWS experience to design, build, and optimize scalable data pipelines and lakehouse solutions. The role focuses on implementing robust batch and streaming data solutions using Databricks, Delta Lake, and AWS cloud-native services, ensuring high performance, scalability, and security.
Key Responsibilities (Data Engineering & Pipeline Development)
- Build and maintain end-to-end data pipelines using Databricks, Delta Lake, and AWS services
- Develop batch, real-time, and streaming data processing workflows
- Implement data ingestion, transformation, curation, and storage pipelines
- Build and optimize large-scale PySpark and SQL-based jobs in Databricks
- Enable real-time data processing using Kafka, AWS Kinesis, or similar streaming tools
Data Lakehouse Implementation
- Work on Databricks-based lakehouse architecture using Delta Lake
- Implement scalable and optimized data storage and processing frameworks
- Ensure data quality, consistency, and reliability across pipelines
- Support metadata management, data lineage, and governance implementation
Cloud & Platform Engineering (AWS)
- Work with AWS services such as S3, Glue, Lambda, Kinesis, and Redshift
- Ensure pipelines are scalable, secure, and cost-optimized in AWS environments
- Implement security controls including RBAC, encryption, and data masking
Optimization & Best Practices - Tune Spark jobs for performance and cost efficiency
- Monitor and troubleshoot data pipeline issues in production
- Follow CI/CD and DevOps practices for deploying data engineering solutions
- Ensure adherence to data engineering standards and best practices
Collaboration
- Work closely with BI teams, and business stakeholders
- Support analytics and AI/ML data requirements through curated datasets
- Collaborate with architects to ensure alignment with AWS-based data strategy
- Required Skills & Qualifications:
- Strong hands-on experience with Databricks.
- Proficiency in Python, PySpark, and SQL
- Strong experience in AWS cloud services (S3, Glue, Lambda, Kinesis, Redshift)
- Experience building ETL/ELT data pipelines
- Strong understanding of Delta Lake and lakehouse concepts
- Experience with streaming and batch data processing
- Knowledge of CI/CD tools and Git
- Strong troubleshooting and performance tuning skills
Desired Qualification
IaC (Terraform/CloudFormation) - Data quality & observability frameworks
- Deeper Databricks-specific features (DLT, Unity Catalog, Workflows)
- Security & compliance depth
- DevOps tooling specifics
- Leadership/communication expectations