Databricks Data Engineer (AWS)

Databricks Data Engineer (AWS)

Posted 3 days ago by Isoftech Inc

Negotiable
Undetermined
Remote
Remote

Summary: The role of Databricks Data Engineer focuses on designing, building, and optimizing scalable data pipelines and lakehouse solutions using Databricks and AWS services. The engineer will implement robust batch and streaming data solutions while ensuring high performance, scalability, and security. Collaboration with BI teams and business stakeholders is essential to support analytics and AI/ML data requirements. Strong hands-on experience with Databricks and AWS is required for this position.

Key Responsibilities:

  • Build and maintain end-to-end data pipelines using Databricks, Delta Lake, and AWS services
  • Develop batch, real-time, and streaming data processing workflows
  • Implement data ingestion, transformation, curation, and storage pipelines
  • Build and optimize large-scale PySpark and SQL-based jobs in Databricks
  • Enable real-time data processing using Kafka, AWS Kinesis, or similar streaming tools
  • Work on Databricks-based lakehouse architecture using Delta Lake
  • Implement scalable and optimized data storage and processing frameworks
  • Ensure data quality, consistency, and reliability across pipelines
  • Support metadata management, data lineage, and governance implementation
  • Work with AWS services such as S3, Glue, Lambda, Kinesis, and Redshift
  • Ensure pipelines are scalable, secure, and cost-optimized in AWS environments
  • Implement security controls including RBAC, encryption, and data masking
  • Tune Spark jobs for performance and cost efficiency
  • Monitor and troubleshoot data pipeline issues in production
  • Follow CI/CD and DevOps practices for deploying data engineering solutions
  • Ensure adherence to data engineering standards and best practices
  • Work closely with BI teams, and business stakeholders
  • Support analytics and AI/ML data requirements through curated datasets
  • Collaborate with architects to ensure alignment with AWS-based data strategy

Key Skills:

  • Strong hands-on experience with Databricks
  • Proficiency in Python, PySpark, and SQL
  • Strong experience in AWS cloud services (S3, Glue, Lambda, Kinesis, Redshift)
  • Experience building ETL/ELT data pipelines
  • Strong understanding of Delta Lake and lakehouse concepts
  • Experience with streaming and batch data processing
  • Knowledge of CI/CD tools and Git
  • Strong troubleshooting and performance tuning skills
  • IaC (Terraform/CloudFormation)
  • Data quality & observability frameworks
  • Deeper Databricks-specific features (DLT, Unity Catalog, Workflows)
  • Security & compliance depth
  • DevOps tooling specifics
  • Leadership/communication expectations

Salary (Rate): undetermined

City: undetermined

Country: undetermined

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

We are looking for a hands-on Databricks Data Engineer with strong AWS experience to design, build, and optimize scalable data pipelines and lakehouse solutions. The role focuses on implementing robust batch and streaming data solutions using Databricks, Delta Lake, and AWS cloud-native services, ensuring high performance, scalability, and security.

Key Responsibilities (Data Engineering & Pipeline Development)

  • Build and maintain end-to-end data pipelines using Databricks, Delta Lake, and AWS services
  • Develop batch, real-time, and streaming data processing workflows
  • Implement data ingestion, transformation, curation, and storage pipelines
  • Build and optimize large-scale PySpark and SQL-based jobs in Databricks
  • Enable real-time data processing using Kafka, AWS Kinesis, or similar streaming tools

Data Lakehouse Implementation

  • Work on Databricks-based lakehouse architecture using Delta Lake
  • Implement scalable and optimized data storage and processing frameworks
  • Ensure data quality, consistency, and reliability across pipelines
  • Support metadata management, data lineage, and governance implementation

Cloud & Platform Engineering (AWS)

  • Work with AWS services such as S3, Glue, Lambda, Kinesis, and Redshift
  • Ensure pipelines are scalable, secure, and cost-optimized in AWS environments
  • Implement security controls including RBAC, encryption, and data masking
    Optimization & Best Practices
  • Tune Spark jobs for performance and cost efficiency
  • Monitor and troubleshoot data pipeline issues in production
  • Follow CI/CD and DevOps practices for deploying data engineering solutions
  • Ensure adherence to data engineering standards and best practices

Collaboration

  • Work closely with BI teams, and business stakeholders
  • Support analytics and AI/ML data requirements through curated datasets
  • Collaborate with architects to ensure alignment with AWS-based data strategy
  • Required Skills & Qualifications:
  • Strong hands-on experience with Databricks.
  • Proficiency in Python, PySpark, and SQL
  • Strong experience in AWS cloud services (S3, Glue, Lambda, Kinesis, Redshift)
  • Experience building ETL/ELT data pipelines
  • Strong understanding of Delta Lake and lakehouse concepts
  • Experience with streaming and batch data processing
  • Knowledge of CI/CD tools and Git
  • Strong troubleshooting and performance tuning skills
    Desired Qualification
    IaC (Terraform/CloudFormation)
  • Data quality & observability frameworks
  • Deeper Databricks-specific features (DLT, Unity Catalog, Workflows)
  • Security & compliance depth
  • DevOps tooling specifics
  • Leadership/communication expectations