Senior data architect (adf & databricks)

InfoVision Inc.

Publicada el 7 junio

Descripción

About the role

as the senior data architect, you will lead the architecture, implementation, and technological direction of our enterprise data platforms. Reporting directly to the senior director of data & ai, you will ensure our cloud technology infrastructure supports our critical business data mission. You will operate as a lead advisor and engineer, collaborating with cross-functional leaders across the entire organization to deliver scalable, secure, and cost-optimized data environments

key responsibilities

1. Databricks architecture & platform ownership

* design and implement enterprise-grade databricks lakehouse architecture across bronze, silver, and gold layers.
* own security and governance frameworks within databricks utilizing unity catalog, access controls, and data lineage.
* establish scalable cluster strategies, job orchestration frameworks, and workspace organization.
* lead the architecture of delta lake design patterns, including partitioning, optimization, and data lifecycle management.
* define and enforce data engineering standards, naming conventions, and architectural patterns across all production pipelines.
* evaluate and implement new databricks capabilities to ensure continuous alignment with enterprise data strategy.

2. Data pipeline development & orchestration

* design, build, and optimize robust, end-to-end etl/elt pipelines using azure data factory (adf) and azure databricks.
* develop robust ingestion frameworks for batch and streaming data from apis, databases, saas platforms, and internal systems.
* create scalable and architecturally sound data transformation frameworks using delta lake, spark, and sql, aligned with enterprise lakehouse standards.
* implement ci/cd parameterization, triggers, and pipeline automation best practices.

3. Azure data platform engineering & cost governance

* architect, manage, and optimize enterprise data environments across adls, azure sql, and databricks, including cluster design, cost governance, and workload isolation strategies.
* conduct advanced performance tuning, cluster scaling, proactive monitoring, and cloud cost optimization.
* implement comprehensive dataops practices including automated testing, version control, monitoring, and comprehensive documentation.

4. Data quality, governance & analytics enablement

* build rigorous data validation, auditing, and error-handling frameworks to ensure absolute data accuracy and consistency.
* troubleshoot complex data issues and deliver sustainable, long-term technical solutions.
* partner directly with bi analysts, data scientists, and operational teams to deliver curated, high-performance datasets.
* build reusable data models optimized for business dashboards, predictive analytics, and ai use cases.
* prepare training datasets and feature tables for machine learning pipelines (preferred).

required qualifications & core skills

* experience: 6+ years of hands-on data architecture and enterprise engineering experience, operating at a databricks architect level designing and implementing enterprise-scale data platforms.
* azure databricks & analytics: deep expertise in azure databricks architecture (notebooks, spark, pyspark, delta lake, workflow orchestration).
* azure data factory: extensive experience building and managing adf pipelines, mapping data flows, and integration runtime (ir) management.
* sql mastery: mastery of complex sql logic, performance optimization, advanced analytics queries, and stored procedures.
* data architecture: deep expertise in lakehouse architecture (medallion: bronze/silver/gold) and delta lake optimization techniques.
* governance & security: strong understanding and practical execution of databricks unity catalog, data governance, access controls, and data lineage models.
* platform operations: proven skill in cluster design, workload isolation, performance tuning, and cloud cost optimization across adls and azure sql.
* dataops & ci/cd: robust experience implementing dataops practices (testing, monitoring, version control, documentation) and ci/cd automation using azure devops, github actions, or databricks repos.
* data ingestion: proven proficiency building scalable cloud etl/elt solutions managing batch and streaming ingestion from apis, databases, saas platforms, and internal systems.

highly preferred qualifications (major plus)

* healthcare data domain: direct experience with healthcare data environments, standards, and systems (ehr/emr, hl7, fhir, claims, or revenue cycle management - rcm).
* ai/ml integration: experience supporting ai/ml workflows, feature engineering, or model enablement.
* environment strategy: experience building and maintaining multi-workspace or multi-environment databricks strategies (dev/test/prod).
* data warehousing: familiarity with azure synapse analytics or equivalent cloud warehousing technologies.
* real-time processing: familiarity with real-time distributed processing (structured streaming) within databricks.

Aplicar

Crear una alerta

Guardar