Data Engineer
Pune, Pune, IN
Purpose of the Role
A Data Engineer in the Data Engineering & Infrastructure team is crucial for designing, building, and maintaining robust data pipelines and infrastructure on Google Cloud Platform. This role ensures reliable data flow across the organization, enabling real-time analytics and data-driven decision-making. Without this role, the company would lack the technical foundation to process, store, and deliver data efficiently, resulting in data silos, poor data quality, and inability to scale data operations.
Build and maintain scalable data pipelines and ETL processes
Ensure data quality and consistency across systems
Implement cloud-native data solutions on GCP
Optimize data storage and processing costs
Enable real-time data processing and analytics capabilities
Responsibilities & Key Deliverables
Data Pipeline Development:
Design, develop, and maintain scalable ETL/ELT pipelines using GCP services like Cloud Dataflow, Cloud Composer (Apache Airflow), and Cloud Functions. Build real-time and batch data processing solutions to handle diverse data sources and formats.
Cloud Infrastructure Management:
Architect and implement data infrastructure on Google Cloud Platform, including BigQuery, Cloud Storage, Cloud SQL, Cloud Spanner, and Bigtable. Optimize performance, cost, and reliability of cloud resources.
Data Integration & Orchestration:
Integrate data from various sources including APIs, databases, IoT devices, and third-party systems. Implement data orchestration workflows using Cloud Composer and ensure seamless data flow across systems.
Data Pipeline Development:
Design, develop, and maintain scalable ETL/ELT pipelines using GCP services like Cloud Dataflow, Cloud Composer (Apache Airflow), and Cloud Functions. Build real-time and batch data processing solutions to handle diverse data sources and formats.
Cloud Infrastructure Management:
Architect and implement data infrastructure on Google Cloud Platform, including BigQuery, Cloud Storage, Cloud SQL, Cloud Spanner, and Bigtable. Optimize performance, cost, and reliability of cloud resources.
Data Integration & Orchestration:
Integrate data from various sources including APIs, databases, IoT devices, and third-party systems. Implement data orchestration workflows using Cloud Composer and ensure seamless data flow across systems.
Data Quality & Governance:
Implement data quality checks, validation rules, and monitoring systems. Ensure compliance with data governance policies and security standards using Cloud DLP, Cloud IAM, and other GCP security services.
Real-time Data Processing:
Build streaming data pipelines using Cloud Pub/Sub, Cloud Dataflow, and BigQuery streaming inserts. Develop solutions for real-time analytics and event-driven architectures.
Performance Optimization:
Optimize query performance in BigQuery, implement partitioning and clustering strategies. Monitor and improve pipeline performance using Cloud Monitoring and Cloud Logging.
SAP Integration (Preferred):
Design and implement data integration solutions with SAP systems (SAP S/4HANA, and allied systems like success Factors,Ariba,concur etc ). Build connectors and pipelines to extract data from SAP modules for analytics and reporting.
Data Quality & Governance:
Implement data quality checks, validation rules, and monitoring systems. Ensure compliance with data governance policies and security standards using Cloud DLP, Cloud IAM, and other GCP security services.
Real-time Data Processing:
Build streaming data pipelines using Cloud Pub/Sub, Cloud Dataflow, and BigQuery streaming inserts. Develop solutions for real-time analytics and event-driven architectures.
Performance Optimization:
Optimize query performance in BigQuery, implement partitioning and clustering strategies. Monitor and improve pipeline performance using Cloud Monitoring and Cloud Logging.
SAP Integration (Preferred):
Design and implement data integration solutions with SAP systems (SAP ECC, S/4HANA, BW/4HANA). Build connectors and pipelines to extract data from SAP modules for analytics and reporting.
Experience
3-4 years of hands-on experience as a Data Engineer with strong focus on Google Cloud Platform. Experience in building and maintaining production-grade data pipelines and infrastructure on GCP.
Qualifications
Bachelor's or Master's degree in Statistics/ Applied statistics
Primary Skill Requirements
Primary Skills:
Google Cloud Platform Expertise:
1.Advanced proficiency in BigQuery (SQL, DML/DDL, optimization techniques)
2.Experience with Cloud Dataflow for batch and streaming data processing
3.Hands-on with Cloud Composer/Apache Airflow for orchestration
4.Cloud Storage, Cloud SQL, Cloud Spanner, and Bigtable implementation
5.Cloud Pub/Sub for event-driven architectures
6.Cloud Functions and Cloud Run for serverless computing
7.DataProc for managed Spark/Hadoop workloads
Programming & Tools:
1.Strong programming skills in Python, Java, or Scala
2.Proficiency in SQL and NoSQL databases
3.Experience with Apache Beam SDK for data processing
4.Infrastructure as Code using Terraform or Cloud Deployment Manager
5.Version control with Git and CI/CD pipelines
Data Engineering Concepts:
1.ETL/ELT design patterns and best practices
2.Data modeling (dimensional, normalized, denormalized)
3.Data warehousing and data lake architectures
4.tream processing and real-time analytics
5.Data partitioning, sharding, and optimization strategies
Security & Governance:
1.GCP IAM, VPC, and security best practices
2.Data encryption and privacy implementation
3.Compliance frameworks understanding (GDPR, HIPAA)
Secondary Skill Requirement
SAP Knowledge (Preferred):
1.Understanding of SAP architecture and data models
2.Experience with SAP HANA, BW/4HANA, or S/4HANA
3.SAP data extraction using ODP, BAPI, or RFC
4.Knowledge of SAP integration tools and connectors
Additional Nice-to-Have:
Experience with other cloud platforms (AWS, Azure)
Knowledge of containerization (Docker, Kubernetes/GKE)
Understanding of ML/AI pipelines on GCP (Vertex AI, ML Engine)
Experience with data visualization tools (Looker, Tableau, Data Studio)
Behavioural Competencies/ Skills
1.Strong problem-solving and analytical thinking
2.Excellent communication skills for technical and non-technical stakeholders
3.Ability to work in cross-functional teams
4.Proactive approach to identifying and solving data challenges
5.Continuous learning mindset for evolving cloud technologies
6.Attention to detail and commitment to data quality
7.Ability to manage multiple projects and prioritize effectively
Job Segment:
Engineer, Engineering