Senior SDE-Data Engineering

‍Position summary:

We are seeking a Senior Software Development Engineer – Data Engineering with 5-8 years of experience to design, develop, and optimize data pipelines and analytics workflows using Snowflake, Databricks, and Apache Spark. The ideal candidate will have a strong background in big data processing, cloud data platforms, and performance optimization to enable scalable data-driven solutions.

Key Roles & Responsibilities:

Design, develop, and optimize ETL/ELT pipelines using Apache Spark, PySpark, Databricks, and Snowflake.
Implement real-time and batch data processing workflows in cloud environments (AWS, Azure, GCP).
Develop high-performance, scalable data pipelines for structured, semi-structured, and unstructured data.
Work with Delta Lake and Lakehouse architectures to improve data reliability and efficiency.
Optimize Snowflake and Databricks performance, including query tuning, caching, partitioning, and cost optimization.
Implement data governance, security, and compliance best practices.
Build and maintain data models, transformations, and data marts for analytics and reporting.
Collaborate with data scientists, analysts, and business teams to define data engineering requirements.
Automate infrastructure and deployments using Terraform, Airflow, or dbt.
Monitor and troubleshoot data pipeline failures, performance issues, and bottlenecks.
Develop and enforce data quality and observability frameworks using Great Expectations, Monte Carlo, or similar tools.

Basic Qualifications:

Bachelor’s or Master’s Degree in Computer Science or Data Science
5-8 years of experience in data engineering, big data processing, and cloud-based data platforms.
Hands-on expertise in Apache Spark, PySpark, and distributed computing frameworks.
Strong experience with Snowflake (Warehouses, Streams, Tasks, Snowpipe, Query Optimization).
Experience in Databricks (Delta Lake, MLflow, SQL Analytics, Photon Engine).
Proficiency in SQL, Python, or Scala for data transformation and analytics.
Experience working with data lake architectures and storage formats (Parquet, Avro, ORC, Iceberg).
Hands-on experience with cloud data services (AWS Redshift, Azure Synapse, Google BigQuery).
Experience in workflow orchestration tools like Apache Airflow, Prefect, or Dagster.
Strong understanding of data governance, access control, and encryption strategies.
Experience with CI/CD for data pipelines using GitOps, Terraform, dbt, or similar technologies.

Preferred Qualifications

Knowledge of streaming data processing (Apache Kafka, Flink, Kinesis, Pub/Sub).
Experience in BI and analytics tools (Tableau, Power BI, Looker).
Familiarity with data observability tools (Monte Carlo, Great Expectations).
Experience with machine learning feature engineering pipelines in Databricks.
Contributions to open-source data engineering projects.

‍