Subrat PP

I am a PhD candidate in Computer Science at NTU Singapore, with broad research interests in Trustworthy Reinforcement Learning and its applications to safety-critical domains. My work focuses on enhancing the transparency, safety, and verifiability of RL policies. Currently, I am exploring Neurosymbolic approaches, particularly programmatic policies, that enable seamless integration of human/domain knowledge and strengthen trustworthiness through their inherent interpretability and amenability for formal verification.

In parallel, I have a strong interest in Neurosymbolic AI, particularly in developing end-to-end methods that support explicit knowledge representation and reasoning, integrating both System 1 (intuitive) and System 2 (deliberative) processes. To deepen my conceptual understanding of AI, I enjoy understanding foundational mathematical topics such as learning theory, and statistics.

Before joining NTU, I completed my Master’s degree in Computer Science at the Indian Statistical Institute (ISI), Kolkata, India, and my Bachelor’s degree in Electronics and Instrumentation at NIT Silchar, India.

Coming Soon!: Giving Back: a series of notebooks implementing NeSy/GenAI models from scratch.

Updates

[June 2026]: Our paper “Neurosymbolic Reasoning with Incremental Knowledge for Sample Efficient Hierarchical Reinforcement Learning” has been accepted to ECML-PKDD 2026. In this work, we show that using incremental knowledge to learn the abstract world model, instead of learning the abstract world model upfront, improves sample efficiency. To enable the integration of knowledge that creates dependencies between states, we propose Belief World Tree Search (BWTS). (paper/codes to be made public soon).
[May 2026]: Our paper “Approximation-Free Differentiable Oblique Decision Trees” has been accepted to JMLR Vol. 27 (2026). In this work, we propose a Top-k inspired method to handle regression tasks and continuous RL environments.
[Feb 2026] RL Fine Tuning Notebook: Notebooks that train a nanoGPT model (Andrej Karpathy) on a digit addition dataset, followed by GRPO-based fine-tuning for subtraction.
[Jan 2026] F1Tenth: Learned ROS and implementated basic version till pure pursuit on the F1Tenth car.
[Aug 2025] Appointed as Vice President in Graduate Students’ Club (CCDS).
[Apr 2025] AI4X Workshop paper on Programmatic Reinforcement Learning for Trustworthy Microgrid Management, which discusses the ∂PRL approach for energy management in microgrids.
[Aug 2024] Our DTSemNet paper which proposes a novel method to train oblique decision trees using gradient descent got accepted in ECAI 2024 “Vanilla Gradient Descent for Oblique Decision Trees”.
[Jan 2024] We organized the first Deep Learning Bootcamp in NTU.
[Aug 2023] Appointed as Director of Career and Development in Graduate Students’ Club (SCSE).
[May 2023] Blog on Interpretable Reinforcement Learning is out.
[Jan 2023] I am starting my Ph.D. (CS) at NTU Singapore. I will be working under the supervision of Arvind and Blaise on interpretable RL policies. The Descartes Program funds my PhD.

Subrat Prasad Panda

Updates