S&P Enterprise Data Organization
Lead Data Scientist
The Team: As a member of the EDO, Collection Platforms & AI – Cognitive Engineering team you will work on building GenAI-driven and ML-powered products and capabilities to power natural language understanding, data extraction, information retrieval and data sourcing solutions for S&P Global. You will define AI strategy, mentor others, and drive production-ready AI products and pipelines while leading by example in a highly engaging work environment. You will work in a (truly) global team and be encouraged for thoughtful risk-taking and self-initiative.
What’s in it for you:
* Be a part of a global company and build solutions at enterprise scale
* Lead and grow a highly skilled, hands-on technical team (including mentoring junior data scientists)
* Contribute to solving high-complexity, high-impact problems end-to-end
* Architect and oversee production-ready pipelines from ideation to deployment
Responsibilities:
* Define AI roadmap, tooling choices, and best practices for model building, prompt engineering, fine-tuning, and vector retrieval systems
* Architect, develop and deploy large-scale ML and GenAI-powered products and pipelines
* Own all stages of the data science project lifecycle, including:
* Identification and scoping of high-value data science and AI opportunities
* Partnering with business leaders, domain experts, and end-users to gather requirements and align on success metrics
* Evaluation, interpretation, and communication of results to executive stakeholders
* Lead exploratory data analysis, proof-of-concepts, model benchmarking, and validation experiments for both ML and GenAI approaches
* Establish and enforce coding standards, perform code reviews, and optimize data science workflows
* Drive deployment, monitoring, and scaling strategies for models in production (including both ML and GenAI services)
* Mentor and guide junior data scientists; foster a culture of continuous learning and innovation
* Manage stakeholders across functions to ensure alignment and timely delivery
Technical Requirements:
* Hands-on experience with large language models (e.g., OpenAI, Anthropic, Llama), prompt engineering, fine-tuning/customization, and embedding-based retrieval
* Expert proficiency in Python (NumPy, Pandas, SpaCy, scikit-learn, PyTorch/TF 2, Hugging Face Transformers)
* Deep understanding of ML & Deep Learning models, including architectures for NLP (e.g., transformers), GNNs, and multimodal systems
* Strong grasp of statistics, probability, and the mathematics underpinning modern AI
* Ability to surf and synthesize current AI/ML research, with a track record of applying new methods in production
* Proven experience on at least one end-to-end GenAI or advanced NLP project: custom NER, table extraction via LLMs, Q&A systems, summarization pipelines, OCR integrations, or GNN solutions
* Familiarity with orchestration and deployment tools: Airflow, Redis, Flask/Django/FastAPI, SQL, R-Shiny/Dash/Streamlit
* Openness to evaluate and adopt emerging technologies and programming languages as needed
* Master’s or Ph.D. in Computer Science, Statistics, Mathematics, or related field (minimum Bachelor’s)
* 6+ years of relevant experience in Data Science/AI, with at least 2 years in a leadership or technical lead role
* Prior experience in the Economics/Financial industry, especially with market-intelligence or risk analytics products
* Public contributions or demos on GitHub, Kaggle, StackOverflow, technical blogs, or publications
Location: Mexico City, Santa Fe (2 days a week onsite)