I am a researcher in computer science, specializing in databases and human-AI interaction. In 2027, I will join Carnegie Mellon University as an Assistant Professor in the Computer Science Department (CSD) and, by courtesy, the HCII.

My research goal is to build great AI data analysts and scientists for humans to work with. Some fun projects include:

  • Databases and tools for unstructured data (DocETL)
  • Interfaces for humans to effectively collaborate with AI (DocWrangler, EvalGen)
  • "Recipes" for AI to perform different stages of the data lifecycle, generalizable across domains and user expertise (AI Evals Course, AI Evals Book)

My PhD is in EECS from UC Berkeley, where I built the DocETL stack (3.7k+ GitHub stars, used by public defenders, climate scientists, and more). My undergrad is in computer science from Stanford.

Academic Service

Reviewer: VLDB (2027–), UIST (2024–), CHI (2024–), NeurIPS (2021, 2022)

Organizer: DEEM Workshop at SIGMOD (2023–2025)

Current Mentees

  • Andrew Cheng (undergrad)
  • Sasha Singh (undergrad)

Past Mentees

  • Parth Asawa (undergrad → PhD student @ Berkeley; CRA Undergraduate Award Honorable Mention)
  • Ruiqi Chen (MS → PhD student @ University of Michigan CSE)
  • Ankush Garg (MS → Senior Data Scientist @ Clarkson Consulting)
  • Rachel Lin (undergrad, MS → Software Engineer @ Opto)
  • Aditi Mahajan (undergrad → Google)
  • Nikhil & Vinay Rao (high school → undergrads @ UC Berkeley EECS)
  • Quentin Romero Lauro (undergrad → CEO @ Inspector, YC 2025; CRA Undergraduate Award Winner)
  • Reya Vir (undergrad → PhD student @ Columbia; NSF GRFP recipient)
  • Yujie Wang (undergrad → Google)
  • Lindsey Wei (undergrad → PhD student @ UC Berkeley EECS; CRA Undergraduate Award Honorable Mention)

Publications

🏆 Won an award
Co-first author is my mentee