Role summary
own the design and operation of reliable, secure, and cost‑efficient data pipelines built
with apache airflow (2.x) and python. You'll deliver batch and streaming ingestion,
transformations, and curated datasets that power connected and infotainment
experiences. You'll lead python-based etl/elt, dag orchestration, and data platform
reliability, security, and observability across our cloud environments.
we are an android app development team looking for someone to own and lead our cloud
data engineering in close partnership with mobile, backend, and product teams. This
ownership would include taking abstract requirements, refining/defining them, creating
development tasks, and then implementing those tasks.
responsibilities
• design, build, and maintain airflow dags using taskflow, dynamic dags,
deferrable operators, providers, and the secrets backend; manage cross‑dag
dependencies and slas.
• develop python etl/elt code to ingest from apis, object storage, message buses,
and databases; package code as reusable libraries.
• operate airflow on managed or self‑hosted platforms (e.g., azure, kubernetes
deployments); implement blue/green or canary dag releases.
• implement data quality and testing with unit tests for operators/hooks, and dag
validation in ci.
• build event‑driven pipelines for near‑real‑time processing; manage schemas and
compatibility.
• model and manage data stores across sql and blob storage; design partitioning,
clustering, and retention.
• observability & lineage: instrument metrics/logs, set slas/alerts, drive
post‑incident reviews and reliability improvements.
• security & governance: apply least‑privilege iam, secrets management, pii
handling, and data contracts; enforce rbac in airflow and warehouses.
• ci/cd & iac: build pipelines to lint/test/deploy dags and python packages;
provision infra with terraform/helm; containerize with docker.
• cost & performance: tune task parallelism, autoscaling, storage formats, and
compute footprints to optimize cost/perf.
• collaboration: work closely with android/backend teams to define interfaces and
data contracts; document decisions and operational runbooks.
skills and qualifications
• 8+ years in data engineering or backend engineering with strong python expertise.
• 2+ years airflow 2.x expertise (operators, hooks, sensors, taskflow, scheduler
tuning).
• proven experience designing reliable etl/elt at scale (batch and streaming) with
robust testing and monitoring.
• strong sql and data modeling skills; hands‑on with one or more data warehouses
(bigquery, redshift, snowflake) and relational systems (postgresql/mysql).
• familiarity with security best practices (rbac, oauth2/oidc for service
integrations), api gateways, and secrets management (vault/aws secrets
manager/gcp secret manager).
• comfortable operating in production: monitoring, troubleshooting, and
performance tuning.
• excellent written and verbal communication; clear trade‑off communication and
autonomous execution with well‑documented decisions.
nice to have
• proficient with ci/cd, git, code reviews, and infrastructure as code (terraform);
containerization with docker and orchestration on kubernetes is a plus.
• spark (pyspark) or beam for large‑scale processing.
• automotive/iot telemetry domain exposure.
• experience with kafka (or event hubs/pub/sub equivalents), schema registry, and
cdc patterns.
• dbt for transformations and testing; delta lake/medallion patterns; feature stores.
logistics
• contract role; collaborate across multi‑disciplinary teams in a fast‑moving,
target‑oriented environment.
• motivated team player with high attention to detail and a creative, pragmatic
approach to problem solving.