Overview
as a senior advanced data engineer here at honeywell, you will play a crucial role in designing, developing, and maintaining advanced data solutions that drive business insights and support decision-making processes. You will leverage your expertise in data engineering to build scalable data pipelines, optimize data storage, and ensure data quality and integrity. Your ability to work with cross-functional teams and translate business requirements into technical solutions will be key to your success in this role. In this role, you will impact the business by enabling data-driven decision-making, optimizing data processes, and improving overall data management. Your work will contribute to increased operational efficiency, cost savings, and enhanced customer satisfaction.
responsibilities
* ai-ready data platform: design and implement end-to-end ingestion pipelines from heterogeneous sources, including snowflake, sql server, excel, rest apis, and unstructured data, into azure databricks.
* data modeling & semantic layer: architect and enforce medallion architecture (bronze → silver → gold) ensuring data arrives clean, validated, and fit for purpose at each layer.
* delta lake & data ops: build delta live tables (dlt) pipelines with declarative data quality expectations, schema evolution, and automated lineage tracking; implement incremental loading patterns using cdc, watermarking, and delta lake merge/upsert for efficient ingestion.
* data processing: enable structured and unstructured data processing (documents, excel files, json, parquet) to build the foundation for ai and ml consumption.
orchestration & data ops
* build and manage databricks workflows with multi-task dependencies, sla monitoring, retry logic, and alerting.
* implement ci/cd pipelines for databricks using azure devops and github actions, including python wheel packaging for reusable utility libraries deployed across the platform.
* apply software engineering best practices: version control, unit testing, modular code design, and automated deployment to dev/qa/prod environments.
* cluster right-sizing, dbu management, delta table optimization (vacuum, compaction), and cost monitoring across azure databricks and gcp.
data governance & quality
* implement and manage unity catalog for centralized data governance: three-level namespace (catalog → schema → table), fine-grained rbac, data masking, and audit logging.
* build data quality frameworks: rule-based validation, deduplication, reconciliation, and anomaly detection to ensure data arrives fit for ai/ml consumption.
* establish data lineage tracking across ingestion, transformation, and serving layers.
* govern data delivery to gcp: ensuring secure, validated, schema-consistent outputs consumed by downstream data science and analytics teams.
ai & proactive analytics foundation
* design pipelines that are ai-ready from day one: supporting structured ml feature pipelines, embedding generation, and future vector db integrations.
* build the data infrastructure that enables the shift from descriptive dashboards to proactive, predictive analytics.
* collaborate with data scientists and analytics engineers to ensure the gold layer supports model training, feature stores, and real-time inference pipelines.
qualifications
* databricks: 4+ years hands-on experience with pyspark, delta lake, workflows, unity catalog.
* demonstrated expertise in data strategy, e.g., medallion architecture, domain data modeling and functional data architecture.
* data quality frameworks (e.g., rule-based validation, anomaly detection).
* data pipelines: incremental loading, cdc, ci/cd, observability.
* advanced python/pyspark and advanced sql.
* strongly preferred: dlt, uc, gcp, azure, kafka.
* databricks certified professional is highly valued.
* 7+ years of overall data engineering experience.
* 4+ years of hands-on azure databricks experience in production environments.
* proven experience building platforms, not just maintaining them: greenfield builds, migrations, framework development.
* experience with financial, engineering, enterprise, or industrial-scale datasets preferred.
* demonstrated ability to own technical decisions end-to-end: from architecture to production deployment.
#j-18808-ljbffr