Viraj Mehta
I recently completed a Ph.D. at the Robotics Institute at Carnegie Mellon University advised by Jeff Schneider. As a researcher, I am broadly interested in reinforcement learning, generative models, and dynamical systems. In particular I work on solving control problems in science with machine learning in regimes where the data-generating process is expensive. Much of my work is motivated by the problem of plasma control for nuclear fusion; the various difficulties we face there frequently inspire more general machine learning problems we can solve as computer scientists. More recently, I’ve applied similar techniques to improve the efficiency of the alignment of language models.
Prior to my time at CMU, I spent time at KKR helping jumpstart their efforts with alternative data and analytics and at Hum Capital (formerly Capital Technologies) doing initial work on automating parts of capital allocation in private finance.
I completed a B.S. in Mathematics and an M.S. in Computer Science at Stanford University, where I conducted research on 3D vision and robot learning advised by Silvio Savaerese.
I’m from Austin, Texas, live in NYC, and bounce around between there, the Bay Area, and Tahoe. Outside of work, I spend my time flying airplanes, lifting weights, reading, and skiing.
selected publications
-
Neural Dynamical Systems: Balancing Structure and Flexibility in Physical PredictionIn IEEE Conference on Decision and Control , 2021
-
Representational aspects of depth and conditioning in normalizing flowsIn International Conference on Machine Learning , 2021
-
An Experimental Design Perspective on Model-Based Reinforcement LearningIn International Conference on Learning Representations , 2022
-
Near-optimal Policy Identification in Active Reinforcement LearningIn International Conference on Learning Representations (oral, top 5% of accepted papers) , 2023
-
Exploration via Planning for Information about the Optimal TrajectoryIn Advances in Neural Information Processing Systems , 2022
-
Sample Efficient Reinforcement Learning from Human Feedback via Active ExplorationarXiv preprint arXiv:2312.00267, 2023