Learn to Grasp with Less Supervision: A Data-Efficient Maximum Likelihood Grasp Sampling Loss

Abstract

Robotic grasping for a diverse set of objects is essential in many robot manipulation tasks. One promising approach is to learn deep grasping models from large training datasets of object images and grasp labels. However, empirical grasping datasets are typically sparsely labeled (i.e., a small number of successful grasp labels**Labels refer to marking the image to indicate a successful robotic grasp. in each image). The data sparsity issue can lead to insufficient supervision and false-negative labels, and thus results in poor learning results. This paper proposes a Maximum Likelihood Grasp Sampling Loss (MLGSL) to tackle the data sparsity issue. The proposed method supposes that successful grasps are stochastically sampled from the predicted grasp distribution and maximizes the observing likelihood. MLGSL is utilized for training a fully convolutional network that generates thousands of grasps simultaneously. Training results suggest that models based on MLGSL can learn to grasp with datasets composing of 2 labels per image. Compared to previous works, which require training datasets of 16 labels per image, MLGSL is 8× more data-efficient. Meanwhile, physical robot experiments demonstrate an equivalent performance at a 90.7% grasp success rate on household objects. Codes and videos are available at [1].

Publication
In IEEE International Conference on Robotics and Automation (ICRA), 2022