Shubhashis Roy Dipta

PhD ResearcherReasoningTool-calling LLMMultimodal Understanding

Shubhashis Roy Dipta

sroydip1@umbc.edu

✍🏻 I actively take notes on my research (My Zettelkasten)


Amazon Science (Alexa)
Applied Scientist Intern (Incoming)
Summer 2026
Manager: Dr. Lichao Wang
Seattle, WA
Amazon Science (Alexa)
Applied Scientist Intern
Summer 2025
Manager: Dr. Lichao Wang
Seattle, WA
Mentor: Dr. Daniel Bis
Scale AI
Machine Learning Research Intern
Summer 2024
Manager: Dr. Adrian Lam
San Francisco, CA
Mentor: Vijay Kalmath
University of Maryland, Baltimore County
Ph.D. in Computer Science
Fall 2023 - Present
Advisor: Dr. Frank Ferraro
Grade: 4.00/4.00
Publications: See Here (From 2023)
University of Maryland, Baltimore County
M.Sc. in Computer Science
Spring 2021 - Spring 2023
Awards: Phi Kappa Phi
Grade: 4.00/4.00
Morgan State University
Research Assistant
2017 - 2019
Advisor: Dr. Iman Dehzangi
Publications: 4 Journal
UniShopr.com
Founder
2017 - 2021

Upcoming Travel

  • ACL 2026 in San Diego, CA (Jul 3-7)
Previous
  • ✅ NeurIPS 2025 in San Diego, CA (Dec 2-7)
  • ❌ AACL 2025 in Mumbai, India (Dec 20-24) (canceled)
👋 I'm open to meet! Email me to schedule a chat!

Peer Review

Reviewed 28+ papers across top venues (2023–2025).

Conferences
*SEMCOLINGNAACLACLNeurIPS
Workshops
W-NUTSemEvalSRWTrustNLPELVM
Journals
Computational and Structural Biotechnology JournalPlant MethodsScientific ReportsBMC Bioinformatics

Shubhashis is a final-year Ph.D. researcher in Computer Science at the University of Maryland, Baltimore County (UMBC), advised by Dr. Frank Ferraro. His work spans Natural Language Processing, Computer Vision, and multimodal reasoning, with a focus on decomposition-based reasoning — making complex problems more reliable by structuring them into atomic, verifiable subproblems.

His research thesis is that complex problems become more reliable when broken into structured, atomic, verifiable subproblems — a principle he has carried across reasoning regimes, modalities, and languages. Recent work includes presupposition-free question decomposition for robust claim verification [1], query-to-event decomposition for zero-shot multilingual text-to-video retrieval [2], and curriculum-driven GRPO for low-resource Bengali mathematical reasoning [3]. His earlier work on Hierarchical Variational Autoencoders for Event Representation Learning [4] underpins downstream applications in summarization, question answering, and counterfactual reasoning.

A recurring theme across his projects is robustness and efficiency over raw accuracy. He builds lightweight, fact-aware reference-free evaluators for video captions [5], designs benchmarks that isolate modality reliance and probe calibrated abstention in omni-modal systems [6], reformulates mathematical reasoning as distractor-aware computational graphs that match reasoning-specialized models at 89% lower token cost [7], and trains tool-calling agents to recall relevant business policies on demand — outperforming in-context baselines by 16 points while using 40% fewer tokens [8].

In Summer 2026, he will return to Amazon Science (Alexa Team) as an Applied Scientist Intern.

In Summer 2025, he interned at Amazon Science (Alexa Team) as an Applied Scientist. He showed how the Deliberative Alignment technique can be used to reduce token usage in tool-calling LLMs and lower inference cost (Paper).

In Summer 2024, he interned at Scale AI (Enterprise Team) as a Machine Learning Researcher. He explored how RL can improve text2SQL generation by using a small LLM-as-a-Judge inside the reinforcement loop (Scale AI Blog).

Graduating Spring 2027 — actively seeking Research Scientist roles in NLP / Multimodal AI. Please reach out if you have an opening.

Beyond Research

Shubhashis founded UniShopr (2017–2021), a cross-border e-commerce platform serving consumers in Bangladesh.

He has also competed internationally in robotics and algorithms — placing 9th at the University Rover Challenge 2015 (Utah, USA) and 22nd at the European Rover Challenge 2016 (Poland), ranking 8th out of 300+ teams at the 2018 ACM ICPC Asia Dhaka Regional with multiple regional and national placements, and reaching the top 70 on Kaggle 🥉 in the Cornell Birdcall Identification competition. See more.

Recent News

Apr 7, 2026 Sucessfully defended my PhD Proposal. Now I am offciially ABD (All But Dissertation). Slides of the proposal can be found here.
Apr 6, 2026 🥳 3 of my papers got accepted to ACL 2026 (VC Inspector, GanitLLM, and Survey on Multimodal Unlearning).
Jan 28, 2026 🥳 2 of my mentored papers got accepted at LoResLM Workshop at EACL 2026 [ 1, 2 ]
Dec 3, 2025 ✈️ Heading to San Diego for NeurIPS 2025. Email me to schedule a chat!
Nov 21, 2025 🥳 3 of my mentored papers got accepted at BLP Workshop at AACL 2025 [ 1, 2, 3 ]
Nov 7, 2025 👨🏻‍💻 Participated in the Synthetic Data AI Agents & OpenEnv Challenge (Certificate of Completion)
Nov 5, 2025 👨🏻‍🏫 Joined as mentor in the Bangla Artificial Intelligence Research Lab (BARTA).

Featured Publications

Check out Google Scholar for a full list of my publications.

  1. PA3: Policy-Aware Agent Alignment through Chain-of-Thought
    Shubhashis Roy Dipta, Daniel Bis, Kun Zhou, Lichao Wang, Benjamin Z. Yao, Chenlei Guo, and Ruhi Sarikaya
    Preprint 2026
  2. Omni-Modal Dissonance Benchmark: Systematically Breaking Modality Consensus to Probe Robustness and Calibrated Abstention
    Zabir Al Nazi*, Shubhashis Roy Dipta*, and Md Rizwan Parvez
    Preprint 2026
    * Equal contribution
  3. GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
    ACL
    GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
    Shubhashis Roy Dipta, Khairul Mahbub, and Nadia Najjar
    ACL 2026
  4. Advancing Reference-free Evaluation of Video Captions with Factual Analysis
    ACL
    Advancing Reference-free Evaluation of Video Captions with Factual Analysis
    Shubhashis Roy Dipta, Tz-Ying Wu, and Subarna Tripathi
    ACL 2026
  5. Multimodal Unlearning Across Vision, Language, Video, and Audio: Survey of Methods, Datasets, and Benchmarks
    ACL
    Multimodal Unlearning Across Vision, Language, Video, and Audio: Survey of Methods, Datasets, and Benchmarks
    Nobin Sarwar, Shubhashis Roy Dipta, Zheyuan Liu, and Vaidehi Patil
    ACL 2026
  6. Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
    Shubhashis Roy Dipta, and Francis Ferraro
    AACL 2025
  7. If We May De-Presuppose: Robustly Verifying Claims through Presupposition-Free Question Decomposition
    Shubhashis Roy Dipta, and Francis Ferraro
    *SEM 2025
  8. Semantically-informed Hierarchical Event Modeling
    Shubhashis Roy Dipta, Mehdi Rezaee, and Francis Ferraro
    *SEM 2023