Experience & Skill Requirements
5+ years implementing AI solutions in cloud environments with focus on AI-services and MLOps
3+ years hands‑on experience with ML model development and production infrastructure
Proven track record delivering production ML systems in enterprise environments
ML & Deep Learning: PyTorch, TensorFlow, distributed training, LLM fine‑tuning, transformer architectures, model optimization, ONNX, vLLM
Data & Processing: Python, SQL, PySpark, Apache Spark, Airflow, Kinesis, feature stores, model serving frameworks
Development & Operations: Streaming/batch architectures at scale, DevOps, CI/CD (GitHub Actions, CodePipeline), monitoring (CloudWatch, Prometheus, MLflow)
End‑to‑end ML systems experience from research to production
Strong communication and collaboration skills
Ability to work independently with minimal supervision
Enterprise security and compliance experience
Local to Mexico City to come on site 2 days per week, or open to relocation to MXC.
Nice to Have Skills & Experience
Recommendation systems, NLP applications, or real‑time inference systems experience
MLOps platform development and feature store implementations
Job Description
Design and optimize machine learning models including deep learning architectures, LLMs, and specialized models (BERT‑based classifiers)
Implement distributed training workflows using PyTorch and other frameworks
Fine‑tune large language models and optimize inference performance using compilation tools (Neuron compiler, ONNX, vLLM)
Optimize models for hardware targets (GPU, TPU, AWS Inferentia/Trainium)
Design AI‑services and architectures for real‑time streaming and offline batch optimization use‑cases
Lead ML infrastructure implementation including data ingestion pipelines, feature processing, model training, and serving environments
Build scalable inference systems for real‑time and batch predictions
Deploy models across compute environments (EC2, EKS, SageMaker, specialized inference chips)
Implement and maintain MLOps platform including Feature Store, ML Observability, ML Governance, Training and Deployment pipelines
Create automated workflows for model training, evaluation, and deployment using infrastructure‑as‑code
Build MLOps tooling that abstracts complex engineering tasks for data science teams
Implement CI/CD pipelines for model artifacts and infrastructure components
Monitor and optimize ML systems for performance, accuracy, latency, and cost
Conduct performance profiling and implement observability solutions across the ML stack
Partner with data engineering to ensure optimal data delivery format/cadence
Collaborate with data architecture, governance, and security teams to meet required standards
Provide technical guidance on modeling techniques and infrastructure best practices
Seniority Level
Mid-Senior level
Employment Type
Contract
Job Function
Information Technology
Information Services
#J-18808-Ljbffr