I’m a Ph.D. candidate in Computer Science at the University of Massachusetts Lowell, advised by Professor Hadi Amiri, where I study data-efficient LLM training and training dynamics through linguistic complexity signals. In 2025, I was a Research Intern at Google DeepMind working on multilingual factuality evaluation for Gemini. Previously, I received my BS in Computer Science from KAIST.

My work connects two themes: using linguistic complexity to make training more efficient and stable, and enabling fine-grained linguistic control over model outputs.

  • Data-efficient LLM training: data ordering and valuation.
  • Training dynamics & interpretability: scaling laws, learning phases, and difficulty signals.

Selected Publications

Full publication list →

News

  • [Jun 2026] Co-organizing the Medical Decision Extraction, Analysis, and Classification Task (MedExACT) at ACL 2026 BioNLP Workshop.
  • [Apr 2026] Received the Computer Science Outstanding Graduate Research Award.
  • [Dec 2025] Released a preprint on curriculum learning for LLM pretraining (learning dynamics analysis).
  • [Nov 2025] Linguistically-Controlled Paraphrase Generation presented at EMNLP 2025.
  • [July 2025] MedDecXtract presented at ACL 2025 Demo Track.
  • [May 2025] Joined Google DeepMind as a Research Intern (Gemini multilingual factuality).
  • [Oct 2024] Released the P-Masking / LingGen preprint on multi-attribute controlled generation.
  • [Dec 2023] Presented Ling-CL at EMNLP 2023.