Senior SDE-Data Engineering
Position summary:
We are seeking a Senior Software Development Engineer – Data Engineering with 5-8 years of experience to design, develop, and optimize data pipelines and analytics workflows using Snowflake, Databricks, and Apache Spark. The ideal candidate will have a strong background in big data processing, cloud data platforms, and performance optimization to enable scalable data-driven solutions.
Key Roles & Responsibilities:
- Design, develop, and optimize ETL/ELT pipelines using Apache Spark, PySpark, Databricks, and Snowflake.
- Implement real-time and batch data processing workflows in cloud environments (AWS, Azure, GCP).
- Develop high-performance, scalable data pipelines for structured, semi-structured, and unstructured data.
- Work with Delta Lake and Lakehouse architectures to improve data reliability and efficiency.
- Optimize Snowflake and Databricks performance, including query tuning, caching, partitioning, and cost optimization.
- Implement data governance, security, and compliance best practices.
- Build and maintain data models, transformations, and data marts for analytics and reporting.
- Collaborate with data scientists, analysts, and business teams to define data engineering requirements.
- Automate infrastructure and deployments using Terraform, Airflow, or dbt.
- Monitor and troubleshoot data pipeline failures, performance issues, and bottlenecks.
- Develop and enforce data quality and observability frameworks using Great Expectations, Monte Carlo, or similar tools.
Basic Qualifications:
- Bachelor’s or Master’s Degree in Computer Science or Data Science
- 5-8 years of experience in data engineering, big data processing, and cloud-based data platforms.
- Hands-on expertise in Apache Spark, PySpark, and distributed computing frameworks.
- Strong experience with Snowflake (Warehouses, Streams, Tasks, Snowpipe, Query Optimization).
- Experience in Databricks (Delta Lake, MLflow, SQL Analytics, Photon Engine).
- Proficiency in SQL, Python, or Scala for data transformation and analytics.
- Experience working with data lake architectures and storage formats (Parquet, Avro, ORC, Iceberg).
- Hands-on experience with cloud data services (AWS Redshift, Azure Synapse, Google BigQuery).
- Experience in workflow orchestration tools like Apache Airflow, Prefect, or Dagster.
- Strong understanding of data governance, access control, and encryption strategies.
- Experience with CI/CD for data pipelines using GitOps, Terraform, dbt, or similar technologies.
Preferred Qualifications
- Knowledge of streaming data processing (Apache Kafka, Flink, Kinesis, Pub/Sub).
- Experience in BI and analytics tools (Tableau, Power BI, Looker).
- Familiarity with data observability tools (Monte Carlo, Great Expectations).
- Experience with machine learning feature engineering pipelines in Databricks.
- Contributions to open-source data engineering projects.