Research Manager
2024-present
Wayve AI
Head of AI
Embodied Foundation Models
2022-2024
Toyota Research Institute
Research Manager
Interaction, Natural Language Processing, Computer Vision and Machine Learning.
2018-2022
Toyota Research Institue
Senior Scientist
Human-Robot Interaction, Speech, Perception and Machine Learning for Home Robotics.
2017-2018
Amazon Inc.
Senior Scientist
Machine Learning and Natural Language Processing for Amazon Echo.
2015-2017
Amazon Inc.
Research Scientist
Machine Learning and Natural Language Processing for Amazon Echo
2013-2015
Apple Inc.
Research Scientist
Machine Learning and Natural Language Processing for Siri
2011-2013
Carnegie Mellon University
Postdoctoral Fellow
Grounded Language Understanding, Natural Language Processing, Machine Learning, Robotics, Artificial Intelligence
2004-2011
Massachusetts Institute of Technology
Graduate Student
Grounded Language Understanding, Natural Language Processing, Machine Learning, Robotics, Artificial Intelligence
2000-2004
University of Rochester
B.S. Computer Science; B.A. Mathematics
I am currently a research manager in the machine learning group at the Toyota Research Institute. I was previously a research scientist in Alexa Machine Learning working on natural language understanding. I was previously a Research Scientist at Apple Inc., developing algorithms for natural language processing and machine learning to improve the accuracy and performance of Siri.
Before industry, I was a Postdoctoral Fellow at Carnegie Mellon University, working at the intersection of perception, robotics, machine learning and natural language processing. I hold a Ph.D. and M.S. in Computer Science, both from MIT, a B.S. in Computer Science, with highest honors, from the University of Rochester, and a B.A. in Mathematics from the University of Rochester. I was general chair of the HRI Pioneers Workshop at the 6th ACM/IEEE International Conference on Human-Robot Interaction. As an undergraduate, I developed an hors d'oeuvre-serving robot to compete in the AAAI robotics competition. I am a member of IEEE, AAAI, and Sigma Xi.
Nicholas Roy, Stefanie Tellex, Felix Duvallet, Matt Walter, Sachi Hemachandra, Manuela Veloso, Jayant Krishnamurthy, Grant Strimel, Deb Roy, Mehdi Samadi, Daniele Nardi, Tony Stentz, Alvaro Soto, Seth Teller, Emma Brunskill Emma Strubell Vittorio Perera Tagyoung Chung Adrien Gaidon Jeannette Bohg Chelsea Finn Percy Liang Siddharth Karamcheti Suraj Nair Dorsa Sadigh Ken Goldberg
DataComp-LM: In search of the next generation of language model training sets
Ludwig Schmidt, Vaishaal Shankar, Achal Dave, Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, ... Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Thomas Kollar ...
NeurIPS Datasets and Benchmarks Track, 2024 | arxiv
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim,Karl Pertsch,Siddharth Karamcheti,Ted Xiao,Ashwin Balakrishna,Suraj Nair,Rafael Rafailov, Ethan Fosterm, Pannag Sanketi, Quan Vuong, Siyuan Feng, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn
CORL, 2024 | website
.
Linearizing Large Language Models
Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar
CoLM 2024 | arxiv
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar
submitted to NeurIPS, 2024 | arxiv
How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation
Joseph A. Vincent, Haruki Nishimura, Masha Itkina, Paarth Shah, Mac Schwager, Thomas Kollar
submitted, 2024 | arxiv
DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti ..., Thomas Kollar, Sergey Levine, Chelsea Finn
RSS 2024 | arxiv
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Dorsa Sadigh, Thomas Kollar
ICML 2024 | arxiv
Language models scale reliably with over-training and on downstream tasks
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt
submitted to NeurIPS 2024 | arxiv
Open x-embodiment: Robotic learning datasets and rt-x models
Open X-Embodiment Collaboration, Thomas Kollar
ICRA 2024 | arxiv
Best Paper
HANDLOOM: Learned Tracing of One-Dimensional Objects for Inspection and Manipulation
Vainavi Viswanath, Kaushik Shivakumar, Mallika Parulekar, Jainil Ajmera, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, Ken Goldberg
CORL 2023 | arxiv
Bagging by Learning to Singulate Layers Using Interactive Perception
Lawrence Yunliang Chen, Baiyu Shi, Roy Lin, Daniel Seita, Ayah Ahmad, Richard Cheng, Thomas Kollar, David Held, Ken Goldberg
IROS 2023 | arxiv
Best Industrial Robotics Research for Applications Paper Nominee
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti, Suraj Nair, Annie Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang
RSS 2023 | arxiv
Best Paper Nominee
TUSK: Tracing to Untangle Semi-Planar Knots
Vainavi Viswanath, Kaushik Shivakumar, Jainil Ajmera, Mallika Parulekar, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, Ken Goldberg
RSS 2023 | arxiv
NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes
Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus,
ICCV 2023 | arxiv
CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu Rares Andrei Ambrus, Jeannette Bohg, Abhinav Valada, Thomas Kollar
CVPR 2023 | arxiv
AutoBag: Learning to Open Plastic Bags and Insert Objects
Lawrence Chen, Baiyi Shi, Daniel Seita, Richard Cheng, Thomas Kollar, David Held, Ken Goldberg
ICRA 2023 | arxiv
SGTM 2.0: Autonomously Untangling Long Cables using Interactive Perception
Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, Ken Goldberg
ICRA 2023 | arxiv
Shapo: Implicit representations for multi-object shape, appearance, and pose optimization
Zubair Irshad, Sergey Zakharov, Rares Ambrus, Thomas Kollar, Zsolt Kira, Adrien Gaidon
ECCV 2022 | arxiv
What Makes Representation Learning from Videos Hard for Control?
Tony Z. Zhao, Siddharth Karamcheti, Thomas Kollar, Chelsea Finn, Percy Liang
RSS Workshop, 2022 | pdf | arxiv
Efficiently Learning Single-Arm Fling Motions to Smooth Garments
Lawrence Yunliang Chen, Huang Huang, Ellen Novoseller, Daniel Seita, Jeffrey Ichnowski, Michael Laskey, Richard Cheng, Thomas Kollar, Ken Goldberg
ISER 2022 | pdf | arxiv
CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation
Muhammad Zubair Irshad, Thomas Kollar, Michael Laskey, Kevin Stone, Zsolt Kira
ICRA 2022 | pdf | arxiv
SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo
Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan, Mark Tjersland
CORL 2021 | pdf | arxiv
A Mobile Manipulation System for One-Shot Teaching of Complex Tasks in Homes
Max Bajracharya, James Borders, Dan Helmick, Thomas Kollar, Michael Laskey, John Leichty, Jeremy Ma, Umashankar Nagarajan, Akiyoshi Ochiai, Josh Petersen, Krishna Shankar, Kevin Stone, Yutaka Takaoka
ICRA 2020 | pdf | arxiv
Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands
Thomas Kollar, Stefanie Tellex, Matthew R. Walter, Albert Huang, Abraham Bachrach, Sachi Hemachandra, Emma Brunskill, Ashis Banerjee, Deb Roy, Seth Teller, Nicholas Roy
Journal to arXiv 2018 | pdf | arxiv
Multi-task learning for parsing the Alexa Meaning Representation Language
Vittorio Perera, Tagyoung Chung, Thomas Kollar, Emma Strubell
AAAI 2018 | pdf
Learning Task Knowledge from Dialog and Web Access
Vittorio Perera, Robin Soetens, Thomas Kollar, Mehdi Samadi, Yichao Sun, Daniele Nardi, Rene van de Molengraft, Manuela Veloso
Robotics, 2015 | pdf
Grounding Verbs of Motion in Natural Language Commands to Robots
Thomas Kollar, Stefanie Tellex, Deb Roy, Nicholas Roy
Experimental Robotics, 79:31-47, 2014 | pdf
Toward Interactive Grounded Language Acquisition
Thomas Kollar, Jayant Krishnamurthy and Grant Strimel
In Proceedings, Robotics: Science and Systems, 2013 | pdf
Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World
Jayant Krishnamurthy and Thomas Kollar
Transactions of the Association for Computational Linguistics, 2013 | pdf
Imitation Learning for Natural Language Direction Following through Unknown Environments
Felix Duvallet, Thomas Kollar, Anthony Stentz
In Proceedings, International Conference on Robotics and Automation (ICRA) 2013 | pdf
Best Cognitive Robotics Paper Nominee
Clarifying Commands with Information-Theoretic Human-Robot Dialog
R. Deits, S. Tellex, P. Thaker, D. Simeonov, T. Kollar and N. Roy.
Journal of Human Robot Interaction, 2(2):58-79, 2013 | pdf
Indoor scene recognition by a mobile robot through adaptive object detection
P. Espinace, T. Kollar, N. Roy, A. Soto
Robots and Autonomous Systems, 61(9):932-947, 2013 | pdf
Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation
Thomas Kollar, Stefanie Tellex, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth Teller, Nicholas Roy
Proceedings of the National Conference on Artificial Intelligence (AAAI), 2011 | pdf
(25% acceptance)
Approaching the Symbol Grounding Problem with Probabilistic Graphical Models
S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. Teller, and N. Roy
AI Magazine. 32(4):64-76, 2011 | pdf
Following and Interpreting Narrated Guided Tours
S. Hemachandra, T. Kollar, N. Roy and S. Teller
International Conference on Robotics and Automation (ICRA), 2011 | pdf
Grounding Spatial Language for Video Search
Stefanie Tellex, Thomas Kollar, George Shaw, Nicholas Roy and Deb Roy
Proceedings of the Twelfth International Conference on Multimodal Interfaces (ICMI), 2010 | pdf
Best Student Paper (44% acceptance; 25% for oral)
Toward Understanding Natural Language Directions
Thomas Kollar, Stefanie Tellex, Deb Roy, and Nick Roy
Proceedings of Human Robot Interaction Conference, 2010
| pdf
(21% acceptance)
Utilizing object-object and object-scene context when planning to find things
Thomas Kollar and Nick Roy
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2009.
| pdf
Trajectory Optimization using Reinforcement Learning for Map Exploration
Thomas Kollar and Nick Roy
International Journal of Robotics Research, 2008.
| pdf
Motion guidance and natural language commands based robotic systems
Thomas Kollar
2023 | link
Learning to understand spatial language for robotic navigation and mobile manipulation
Thomas Kollar
Ph.D. thesis, Massachusetts Institute of Technology,
2011.
(email)
General: tkollar(_.at._)gmail.com
(social)
LinkedIn
Twitter
Google Scholar
(cv)
pdf