Viraj Mehta

Research

I recently spent 4 years doing a PhD at the Robotics Institute at Carnegie Mellon advised by Jeff Schneider. My thesis was titled "Sample-Efficient Reinforcement Learning with Applications in Nuclear Fusion". Some of my favorite questions I've worked on are:

For fusion power, we need to be able to safely turn off a tokamak. Can we optimize the controls by a process of (efficient) trial and error on the real machine? Yes.
In reinforcement learning, if you could select a single new datapoint to observe from the problem dynamics, what would be the best one to observe? One effective heuristic is to collect the datapoint that maximizes information gain about the optimal trajectory. Later, we built a planning algorithm for exploration from this insight.
A special case of the previous question: If you are post-training a large language model using reinforcement learning from human feedback and you could collect a single additional preference label over completions, which one would you collect? Here, we choose the prompt which maximizes uncertainty over the state-value function, building off our earlier work giving a model-free approach to the general problem statement above.
Normalizing flows are neat generative models that allow you to both sample from and compute likelihoods of probability distributions learned from samples. However, early variants were notoriously hard to train. Why is that? We were able to answer some of those questions by proving theorems about the relationship between their depth and the "steepness" of the functions being fit while also showing that certain other design choices shouldn't matter.

I've been involved in AI research in some capacity since 2016, when a couple friends and I built an extremely basic text classification project attempting to help the administrators at Stanford figure out their inscrutable general education requirements. Thankfully, things got better from there.

Further along in my undergrad career, I joined the Stanford vision lab under the supervision of Silvio Savarese and Animesh Garg and began working on ideas in 3D vision and robot learning. Despite spending months unsucessfully trying to differentiate through robot fingers, I realized that AI research is just about the most fun and compelling thing I could imagine working on. That's why I got a PhD.

You can find all my published work listed here.

About Me

I grew up in Austin, TX but before that spent 12 years running around the woods in Wheeling, WV.
I was a mediocre-to-decent high school football player at Westlake High School (notable only because that program is insanely successful and prolific despite my involvement).
At one point, I competed on college Jeopardy! and managed to place third, win $25,000, and accidentally flip off Alex Trebek while discussing the relationship between Gauss's Theorema Egregium and the folding of a slice of pizza so that it doesn't flop down while you eat it.
I acquired a private pilot license in 2020 and have flown small planes in places like Catalina Island, Texas, and Lake Tahoe.

Contact

You can find me on Twitter or email me at my twitter handle @gmail.com.

Viraj Mehta

Current Work

Research

Other Experience

About Me

Contact