Senior data pipeline engineer

Global Connect Technologies

Publicada el 5 diciembre

Descripción

Role summary

own the design and operation of reliable, secure, and cost‑efficient data pipelines built

with apache airflow (2.x) and python. You'll deliver batch and streaming ingestion,

transformations, and curated datasets that power connected and infotainment

experiences. You'll lead python-based etl/elt, dag orchestration, and data platform

reliability, security, and observability across our cloud environments.

we are an android app development team looking for someone to own and lead our cloud

data engineering in close partnership with mobile, backend, and product teams. This

ownership would include taking abstract requirements, refining/defining them, creating

development tasks, and then implementing those tasks.

responsibilities

• design, build, and maintain airflow dags using taskflow, dynamic dags,

deferrable operators, providers, and the secrets backend; manage cross‑dag

dependencies and slas.

• develop python etl/elt code to ingest from apis, object storage, message buses,

and databases; package code as reusable libraries.

• operate airflow on managed or self‑hosted platforms (e.g., azure, kubernetes

deployments); implement blue/green or canary dag releases.

• implement data quality and testing with unit tests for operators/hooks, and dag

validation in ci.

• build event‑driven pipelines for near‑real‑time processing; manage schemas and

compatibility.

• model and manage data stores across sql and blob storage; design partitioning,

clustering, and retention.

• observability & lineage: instrument metrics/logs, set slas/alerts, drive

post‑incident reviews and reliability improvements.

• security & governance: apply least‑privilege iam, secrets management, pii

handling, and data contracts; enforce rbac in airflow and warehouses.

• ci/cd & iac: build pipelines to lint/test/deploy dags and python packages;

provision infra with terraform/helm; containerize with docker.

• cost & performance: tune task parallelism, autoscaling, storage formats, and

compute footprints to optimize cost/perf.

• collaboration: work closely with android/backend teams to define interfaces and

data contracts; document decisions and operational runbooks.

skills and qualifications

• 8+ years in data engineering or backend engineering with strong python expertise.

• 2+ years airflow 2.x expertise (operators, hooks, sensors, taskflow, scheduler

tuning).

• proven experience designing reliable etl/elt at scale (batch and streaming) with

robust testing and monitoring.

• strong sql and data modeling skills; hands‑on with one or more data warehouses

(bigquery, redshift, snowflake) and relational systems (postgresql/mysql).

• familiarity with security best practices (rbac, oauth2/oidc for service

integrations), api gateways, and secrets management (vault/aws secrets

manager/gcp secret manager).

• comfortable operating in production: monitoring, troubleshooting, and

performance tuning.

• excellent written and verbal communication; clear trade‑off communication and

autonomous execution with well‑documented decisions.

nice to have

• proficient with ci/cd, git, code reviews, and infrastructure as code (terraform);

containerization with docker and orchestration on kubernetes is a plus.

• spark (pyspark) or beam for large‑scale processing.

• automotive/iot telemetry domain exposure.

• experience with kafka (or event hubs/pub/sub equivalents), schema registry, and

cdc patterns.

• dbt for transformations and testing; delta lake/medallion patterns; feature stores.

logistics

• contract role; collaborate across multi‑disciplinary teams in a fast‑moving,

target‑oriented environment.

• motivated team player with high attention to detail and a creative, pragmatic

approach to problem solving.

Aplicar

Crear una alerta

Guardar