Data Scientist, AI/ML Model Quality

Austin, USonsitemid3-6 yearsWork permit required

Description

Would you like to contribute to Machine Learning and Generative AI technologies? Are you passionate about the integrity of the data that powers AI systems at scale? Do you believe that trustworthy data is the foundation of every great model? We truly believe it is!We are defining what exceptional data quality looks like for machine learning across Wallet, Payments, and Commerce. As a Data Scientist, AI/ML Model Quality, you will build and maintain intelligent systems, validation frameworks, and monitoring pipelines that keep our data ecosystem healthy - ensuring that every model we build is trained, evaluated, and deployed on data we can trust. Your work sits at the foundation of every ML feature that reaches hundreds of millions of users.You'll work at the intersection of statistical rigor and production systems, collaborating closely with ML Engineering, Data Engineering, Privacy, and Legal teams. This unique opportunity puts you at the center of ML and AI quality - owning the health of training and validation datasets, defining and analyzing observability metrics to surface actionable product insights, and leading telemetry analysis across GenAI workflows - ensuring Apple's financial features are built on the highest-quality data, whether powering conventional ML models or the latest generative AI systems.DescriptionThe ideal candidate is a detail-obsessed data scientist who understands that model quality starts long before training - it starts with the data. You have strong statistical instincts, know how silent degradation and data drift manifest in production systems, and can translate raw quality signals into insights that drive real decisions.You will own the health of the data ecosystem that underpins ML and GenAI features across Wallet, Payments, and Commerce - building validation frameworks, defining observability metrics, and leading telemetry analysis that keeps every model trained, evaluated, and monitored on data teams can trust. Your work sits at the foundation of every ML feature that reaches hundreds of millions of users.Responsibilities:Curate, analyze, and maintain gold-standard ground-truth datasets for model evaluation and continuous validation across both ML and GenAI systems.Audit training data for systemic bias and fairness gaps prior to model deployment; establish ongoing analytical checks to catch bias introduced by data drift over time.Define, track, and report key data quality metrics - completeness, accuracy, timeliness, validity - for engineering and leadership audiences.Design and define automated data quality rules and thresholds, partnering with Data Engineering to ensure these checks are integrated into model development and CI/CD workflowsDefine and own ML observability metrics - model performance, output distributions, training-serving skew, silent degradation and feature drift - translating raw production signals into actionable insights for engineering and product teams.Design and develop observability dashboards and reporting workflows that give stakeholders a consistent, real-time view of model health across both conventional ML and GenAI systems.Define and analyze telemetry across GenAI workflows, tracking quality signals such as output coherence, latency, task completion rates, and regression patterns.Identify degradation patterns and domain-specific failure modes in GenAI systems through systematic telemetry analysis, translating findings into concrete recommendations for model and data teams.Preferred QualificationsExperience with data visualization and dashboarding tools (e.g., Tableau, Apache Superset, Databricks) to present complex ML telemetry.Familiarity with LLM evaluation frameworks (e.g. LangSmith) or techniques like LLM-as-a-judge.Experience with Bayesian or causal graph-based approaches to synthetic data generation.Familiarity with confidence calibration techniques and uncertainty quantification.Experience with ML monitoring or observability platforms (e.g., MLflow, Weights & Biases, or equivalent).Experience working with privacy-constrained data or under regulatory compliance frameworks (GDPR, DMA).Background in financial services, fintech, or consumer payment products.Minimum QualificationsA Bachelor's degree with exceptional hands-on experience in ML/AI model quality or applied research or a M.S or Ph.D in Machine Learning, Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field is strongly preferred.3+ years of experience in data science or a closely related analytical role, with a strong focus on data quality, model evaluation, or ML observability in production environments.Proficiency in Python (Pandas, NumPy, Scikit-learn) and SQL for complex data analysis, metric creation, and validation.Experience querying and analyzing large-scale datasets using distributed computing frameworks (e.g., PySpark, Spark, or distributed SQL).Solid understanding of statistical methods - hypothesis testing, distribution a

Required skills

Data and Analytics

Tech stack

Python

This role may require work authorization in US

Check with the employer about specific visa or work permit requirements before applying.

About Austin, US

Cost of living

medium

Avg tech salary

$110K-$180K USD

Remote work

Growing tech hub, mix of remote and onsite

Posted 6 days agoSource: The MuseView original listing

Similar roles

Platform Engineer - Security

Apple · Austin, US

$1 – $130K (~₹83 – ₹1.1Cr)

senior

Front-End CAD Methodology Engineer

Apple · Austin, US

senior

Identity Architect

Apple · Seattle, US

senior

Haptics Hardware Electrical Engineer

Apple · Cupertino, US

mid

Quality & Automation Engineer- Supply Chain Innovation

Apple · Hyderabad, India

Want to know your chances? OpteroAI predicts your offer probability for this role based on your profile.

See your offer score

Free to start. No credit card.

About Apple

Glassdoor rating4.2/5

Company sizeEnterprise

IndustryDeveloper Tools

Open roles11

Work-life balance3.8/5

Avg tenure4.0 yrs

StagePublic (AAPL)

Size164,000+

HQCupertino, CA

Founded1976

Company Insights

Glassdoor rating

4.2

Work-life balance

3.8

Avg employee tenure

4.0 years

Hiring behavior

Avg response time

21 days

Ghost rate

30% (Moderate)

Interview to offer

20%

Interview rounds

Interview style

secretive

Hiring speed

moderate