Sr advanced ai data engineer

Monterrey, N.L.

Honeywell

Publicada el 28 mayo

Descripción

As a Senior Advanced Data Engineer here at Honeywell, you will play a crucial role in designing, developing, and maintaining advanced data solutions that drive business insights and support decision-making processes. You will leverage your expertise in data engineering to build scalable data pipelines, optimize data storage, and ensure data quality and integrity.

Your ability to work with cross-functional teams and translate business requirements into technical solutions will be key to your success in this role.

In this role, you will impact the business by enabling data-driven decision-making, optimizing data processes, and improving overall data management. Your work will contribute to increased operational efficiency, cost savings, and enhanced customer satisfaction.

At Honeywell, our people leaders play a critical role in developing and supporting our employees to help them perform at their best and drive change across the company. Help to build a strong, diverse team by recruiting talent, identifying, and developing successors, driving retention and engagement, and fostering an inclusive culture.

AI-Ready Data Platform

1. Design and implement end-to-end ingestion pipelines from heterogeneous sources: including Snowflake, SQL Server, Excel, REST APIs, and unstructured data: into Azure Databricks
2. Architect and enforce Medallion Architecture (Bronze → Silver → Gold) ensuring data arrives clean, validated, and fit for purpose at each layer
3. Build Delta Live Tables (DLT) pipelines with declarative data quality expectations, schema evolution, and automated lineage tracking
4. Implement incremental loading patterns using CDC (Change Data Capture), watermarking, and Delta Lake MERGE/UPSERT for efficient, scalable ingestion
5. Enable structured and unstructured data processing: documents, Excel files, JSON, Parquet : building the foundation for AI and ML consumption

Data Modeling & Semantic Layer

6. Design and implement the Engineering data model: dimensional models, fact/dimension tables, and domain-specific data marts: serving analytics, BI, ML and AI use cases
7. Build a governed, reusable semantic layer on top of the Gold layer, enabling self-service analytics through Power BI and GCP-connected consumers
8. Ensure data models are documented, versioned, and aligned to business domains within the VECE COE

Orchestration and Data Ops

9. Build and manage Databricks Workflows with multi-task dependencies, SLA monitoring, retry logic, and alerting
10. Implement CI/CD pipelines for Databricks using Azure DevOps and GitHub Actions : including Python Wheel packaging for reusable utility libraries deployed across the platform
11. Apply software engineering best practices: version control, unit testing, modular code design, and automated deployment to Dev/QA/Prod environments
12. Cluster right-sizing, DBU management, Delta table optimization (VACUUM, compaction), cost monitoring across Azure Databricks and GCP

Data Governance & Quality

13. Implement and manage Unity Catalog for centralized data governance: three-level namespace (catalog → schema → table), fine-grained RBAC, data masking, and audit logging
14. Build data quality frameworks: rule-based validation, deduplication, reconciliation, and anomaly detection: ensuring data arrives fit for AI/ML consumption
15. Establish data lineage tracking across ingestion, transformation, and serving layers
16. Govern data delivery to GCP: ensuring secure, validated, schema-consistent outputs consumed by downstream data science and analytics teams

AI & Proactive Analytics Foundation

17. Design pipelines that are AI-ready from day one: supporting structured ML feature pipelines, embedding generation, and future Vector DB integrations
18. Build the data infrastructure that enables the shift from descriptive dashboards to proactive, predictive analytics
19. Collaborate with Data Scientists and Analytics Engineers to ensure the Gold layer supports model training, feature stores, and real-time inference pipelines

YOU MUST HAVE

20. Databricks: 4+ years hands-on: PySpark, Delta Lake, Workflows, Unity Catalog.
21. Demonstrate expertise in data strategy, for example: Medallion Architecture, Domain Data Modeling and Functional Data Architecture.
22. Data Quality Frameworks (. rule-based validation, anomaly detection)
23. Data Pipelines: incremental loading, CDC, CI/CD, Observability
24. Advanced Python/Pyspark and Advanced SQL
25. Strongly preferred: DLT, UC, GCP, Azure, Kafka.
26. Highly value Databricks Certified Professional
27. 7+ years of overall data engineering experience
28. 4+ years of hands-on Azure Databricks experience in production environments
29. Proven experience building platforms, not just maintaining them: greenfield builds, migrations, framework development
30. Experience with financial, engineering, enterprise, or industrial-scale datasets preferred
31. Demonstrated ability to own technical decisions end-to-end: from architecture to production deployment

#LI-Hybrid

Aplicar

Crear una alerta

Guardar

Oferta similar

Procurement buyer ii

Monterrey, N.L.

Honeywell

Oferta similar

Senior channel sales executive, remote

Monterrey, N.L.

Honeywell

Oferta similar

Service&repair technician i

Ciudad Apodaca, N.L.

Honeywell