Role summary own the design and operation of reliable, secure, and cost‑efficient data pipelines built with apache airflow (2.x) and python. You'll lead python-based etl/elt, dag orchestration, and data platform reliability, security, and observability across our cloud environments. We are an android app development team looking for someone to own and lead our cloud data engineering in close partnership with mobile, backend, and product teams. Responsibilities
• design, build, and maintain airflow dags using taskflow, dynamic dags, deferrable operators, providers, and the secrets backend; develop python etl/elt code to ingest from apis, object storage, message buses, and databases; operate airflow on managed or self‑hosted platforms (e.g., azure, kubernetes deployments); implement data quality and testing with unit tests for operators/hooks, and dag validation in ci.
• model and manage data stores across sql and blob storage; security & governance: apply least‑privilege iam, secrets management, pii handling, and data contracts; enforce rbac in airflow and warehouses.
• ci/cd & iac: build pipelines to lint/test/deploy dags and python packages; provision infra with terraform/helm; containerize with docker.
• cost & performance: tune task parallelism, autoscaling, storage formats, and compute footprints to optimize cost/perf.
• collaboration: work closely with android/backend teams to define interfaces and data contracts; skills and qualifications
• 8+ years in data engineering or backend engineering with strong python expertise.
• 2+ years airflow 2.proven experience designing reliable etl/elt at scale (batch and streaming) with robust testing and monitoring.
• strong sql and data modeling skills; hands‑on with one or more data warehouses (bigquery, redshift, snowflake) and relational systems (postgresql/mysql).
• familiarity with security best practices (rbac, oauth2/oidc for service integrations), api gateways, and secrets management (vault/aws secrets manager/gcp secret manager).
• comfortable operating in production: monitoring, troubleshooting, and performance tuning.
• nice to have
• proficient with ci/cd, git, code reviews, and infrastructure as code (terraform); containerization with docker and orchestration on kubernetes is a plus.
• spark (pyspark) or beam for large‑scale processing.
• experience with kafka (or event hubs/pub/sub equivalents), schema registry, and cdc patterns.
• dbt for transformations and testing; logistics
• contract role;