Apple · LLM Evaluation · Multimodal AI

Designing AI that sees, thinks & acts.

Machine Learning Data Scientist at Apple, working on the next generation of AI intelligence for Apple devices. Previously with Adobe Research, Intuit, and Morgan Stanley. Columbia University, M.S. Data Science.

San Francisco Bay Area Available for collaboration
Prakhar Dungarwal
PRAKHAR DUNGARWALAPPLE PARK · 2026
LLM Evaluation Multimodal AI Agentic Frameworks RAG Systems Visual Intelligence MLLM-as-a-Judge LLM Evaluation Multimodal AI Agentic Frameworks RAG Systems Visual Intelligence MLLM-as-a-Judge
01 · About

A researcher and engineer working at the frontier of AI intelligence.

I build evaluation infrastructure for production AI — the layer between a frontier model and a shipped product.

My work spans Morgan Stanley, Columbia's AC4 Lab, Intuit, and Adobe Research — building RAG systems, multi-agent frameworks, and multimodal MLLM-as-a-Judge pipelines. Today at Apple, I design evaluation systems for Visual Intelligence: scalable, statistically rigorous, and grounded in real-world behavior.

Less interested in "can the model do it?" than in "how do we know it did?"

02 · Experience

Places I've built things.

2026 — Present

Apple

ML Data Scientist · LLM Eval & Multimodal AI
  • Build evaluation systems for Visual Intelligence across Apple devices using LLM-as-a-Judge, enabling scalable assessment of generative model outputs.
  • Develop agentic evaluation frameworks for multimodal agents and multi-turn conversational AI, improving robustness and real-world performance.
  • Design statistically rigorous pipelines for generative and multimodal model validation.
Aug 2025 — Dec 2025

Adobe Research

Generative AI Researcher · Capstone
  • Developed personalized multimodal LLM frameworks using GPT-5 mini and GPT-Image-1, defining 10+ evaluation factors to assess image edits with 95% consistency across datasets.
  • Built automated MLLM-as-a-Judge pipelines leveraging RAG, reasoning traces, and agentic tool binding, improving model judgment accuracy by 20%.
May 2025 — Aug 2025

Intuit

AI & Data Science Intern
  • Developed an Agentic AI framework with LangGraph and a ReAct multi-agent system, cutting ground-truth annotation time by 50% through agent orchestration.
  • Used GPT-4.5 and Claude-3.5-Sonnet to extract business quality markers from call transcripts — +30% recall with CoT and LLM-as-a-Judge evaluation.
Aug 2024 — Present

Columbia AC4 Lab

AI Researcher · Climate School
  • Fine-tuned RoBERTa on GoEmotions to classify YouTube videos by emotion (respect vs. contempt), integrating DeepSeek-R1 and GPT-4o for reasoning.
  • Built cloud backend on AWS (S3, EC2), LangChain, and YouTube APIs for a Chrome extension analyzing watch history.
Jan 2022 — Aug 2024

Morgan Stanley

Senior Data Scientist · ML & AI
  • Led a RAG chatbot on Vicuna (Llama-2 fine-tuned) with all-mpnet-base-v2 — cut query resolution time 60% and annual ops costs 30%.
  • Hybrid CNN + Exponential Smoothing + Prophet forecasting model — +20% accuracy, $1M in savings.
  • Optimized RAG training/inference with PyTorch + LangChain distributed pipelines — −40% deployment latency.
03 · Selected Projects

Things I've built on the side.

A mix of agentic systems, multimodal research, and practical ML tools — built outside of work, shared in public.

CineGraphRAG knowledge graph illustration
Agentic RAG · 2025

CineGraphRAG — MCP Agent

A fully agentic RAG system over a movie knowledge graph, exposed through an MCP server. Combines semantic retrieval, graph traversal, and tool-using agents to answer multi-hop cinematic queries with verifiable citations.

LangGraphMCPNeo4jOpenAIPython
CART multimodal transformer illustration
Multimodal · 2024

CART — Context-Aware Rendition with Transformers

A transformer pipeline that adapts generated renditions to user context using few-shot prompting and retrieval-augmented personalization. Evaluated against human preferences across 10+ coherence and faithfulness factors.

PyTorchTransformersCLIPRAG
Peace Speech Chrome extension illustration
NLP · Research

Peace Speech — Chrome Extension

A browser extension analyzing YouTube watch history for peace-vs-contempt signals. RoBERTa fine-tuned on GoEmotions, DeepSeek-R1 reasoning traces, AWS (S3/EC2) backend, deployed as part of Columbia's AC4 peace research.

RoBERTaDeepSeek-R1AWSLangChain
LLM-enhanced transit planner illustration
LLM Agents · 2025

Transit Planner — LLM-Enhanced

An agentic trip planner that reasons over real transit APIs, traffic, weather, and user preferences to produce multi-modal routes with natural-language explanations and counterfactual alternatives.

GPT-4oLangGraphTool UseFastAPI
04 · Research

Published work.

2026 · arXiv preprint

Measuring and Fostering Peace through Machine Learning and Artificial Intelligence

Columbia University, AC4 · Climate School

Co-authored. Proposes an AI-driven framework for quantifying and promoting peace outcomes — combining NLP-based emotion modeling with decision-support tooling.

Read on arXiv →
2021 · IRJET

Handwritten Text Recognition using Deep Learning for Automated Paper Checking

International Research Journal of Engineering and Technology

A deep-learning HTR system for automating student paper grading — end-to-end pipeline covering segmentation, sequence decoding, and evaluation alignment.

Read paper →
05 · Toolkit

What I work with.

ML & Modeling

  • PyTorch
  • TensorFlow
  • Transformers
  • scikit-learn
  • XGBoost

LLM / Agentic

  • LangGraph
  • LangChain
  • MCP
  • ReAct
  • RAG pipelines

Models

  • Apple Foundation Models (AFM)
  • GPT-5 mini
  • GPT-Image-1
  • Gemini 3.1 Pro
  • Gemini 2.5 Pro
  • Claude 3.5 Sonnet
  • Llama-2 / Vicuna
  • DeepSeek-R1

Infra & Data

  • AWS (S3, EC2)
  • Python / SQL
  • FastAPI
  • Spark
  • Neo4j
06 · Education

Where I learned.

Columbia University

Columbia University

M.S. Data Science · School of Engineering & Applied Science
Select coursework
Machine Learning Applied Deep Learning Natural Language Processing Applied Data Science Algorithms for Data Science Probability & Statistics Statistical Inference Causal Inference
2024 — 2025 GPA 3.85 / 4.0 New York
Vellore Institute of Technology

Vellore Institute of Technology

B.Tech, Computer Science & Engineering
Select coursework
Data Structures & Algorithms Machine Learning Artificial Intelligence Database Management Operating Systems Computer Networks Computer Vision Software Engineering
2018 — 2022 GPA 3.96 / 4.0 India
07 · Recognition

Leadership & certifications.

Technology Innovation Program Finalist

Morgan Stanley · 2022

TAP Graduate

Morgan Stanley Technology Analyst Program · 2022

Azure AI Fundamentals

Microsoft Certified

ML/AI Training SME & Mentor

Morgan Stanley Intern Training

Tech Trainer

Katalyst · equity-in-tech nonprofit

PiJam Member

Community of Python educators & learners

08 · Contact

Let's build something useful.

Based in

San Francisco Bay Area

Status

Open to research collaboration