Senior SDE-Data Engineering

Position summary:

We are seeking a Senior Software Development Engineer – Data Engineering with 5-8 years of experience to design, develop, and optimize data pipelines and analytics workflows using Snowflake, Databricks, and Apache Spark. The ideal candidate will have a strong background in big data processing, cloud data platforms, and performance optimization to enable scalable data-driven solutions.

Key Roles & Responsibilities:

  • Design, develop, and optimize ETL/ELT pipelines using Apache Spark, PySpark, Databricks, and Snowflake.
  • Implement real-time and batch data processing workflows in cloud environments (AWS, Azure, GCP).
  • Develop high-performance, scalable data pipelines for structured, semi-structured, and unstructured data.
  • Work with Delta Lake and Lakehouse architectures to improve data reliability and efficiency.
  • Optimize Snowflake and Databricks performance, including query tuning, caching, partitioning, and cost optimization.
  • Implement data governance, security, and compliance best practices.
  • Build and maintain data models, transformations, and data marts for analytics and reporting.
  • Collaborate with data scientists, analysts, and business teams to define data engineering requirements.
  • Automate infrastructure and deployments using Terraform, Airflow, or dbt.
  • Monitor and troubleshoot data pipeline failures, performance issues, and bottlenecks.
  • Develop and enforce data quality and observability frameworks using Great Expectations, Monte Carlo, or similar tools.

Basic Qualifications:

  • Bachelor’s or Master’s Degree in Computer Science or Data Science
  • 5-8 years of experience in data engineering, big data processing, and cloud-based data platforms.
  • Hands-on expertise in Apache Spark, PySpark, and distributed computing frameworks.
  • Strong experience with Snowflake (Warehouses, Streams, Tasks, Snowpipe, Query Optimization).
  • Experience in Databricks (Delta Lake, MLflow, SQL Analytics, Photon Engine).
  • Proficiency in SQL, Python, or Scala for data transformation and analytics.
  • Experience working with data lake architectures and storage formats (Parquet, Avro, ORC, Iceberg).
  • Hands-on experience with cloud data services (AWS Redshift, Azure Synapse, Google BigQuery).
  • Experience in workflow orchestration tools like Apache Airflow, Prefect, or Dagster.
  • Strong understanding of data governance, access control, and encryption strategies.
  • Experience with CI/CD for data pipelines using GitOps, Terraform, dbt, or similar technologies.

Preferred Qualifications

  • Knowledge of streaming data processing (Apache Kafka, Flink, Kinesis, Pub/Sub).
  • Experience in BI and analytics tools (Tableau, Power BI, Looker).
  • Familiarity with data observability tools (Monte Carlo, Great Expectations).
  • Experience with machine learning feature engineering pipelines in Databricks.
  • Contributions to open-source data engineering projects.