Several recent works show impressive results in mapping language-based human commands and image scene observations to direct robot executable policies (e.g., pick and place poses). However, these approaches do not consider the uncertainty of the trained policy and simply always execute actions suggested by the current policy as the most probable ones. This makes them vulnerable to domain shift and inefficient in the number of required demonstrations.
We extend previous works and present the PARTNR algorithm that can detect ambiguities in the trained policy by analyzing multiple modalities in the pick and place poses using topological analysis. PARTNR employs an adaptive, sensitivity-based, gating function that decides if additional user demonstrations are required. User demonstrations are aggregated to the dataset and used for subsequent training. In this way, the policy can adapt promptly to domain shift and it can minimize the number of required demonstrations for a well-trained policy. The adaptive threshold enables to achieve the user-acceptable level of ambiguity to execute the policy autonomously and in turn, increase the trustworthiness of our system.
We demonstrate the performance of PARTNR in a table-top pick and place task.
PARTNR is an interactive imitation learning algorithm that asks the human to take over control in case it considers the situation to be ambiguous. The situation is ambiguous when the learned policy does not provide a single dominant solution, i.e., there are multiple local maxima with close values in the action space. User demonstrations are aggregated to the dataset 𝒟 and used for subsequent training. The robot observes, at each execution step, a human-provided natural language command and the state of the environment (e.g., a top-view image of the table). Based on the observation, the policy provides the heatmap, representing the value of the action. The heatmap is then analyzed to detect multiple local maxima (in TopAnalysis). In this work, we rely on computational topology methods for finding local maxima, specifically we use a persistent homology method. Then, in AmbiguityMeasure, the obtained corresponding values of the local maxima T, are normalized using the softmax function and the maximum value is then used to decide if the situation is ambiguous. If AmbiguityMeasure(T) is smaller than a threshold value, the situation is ambiguous. In case the situation is ambiguous, the robot is not executing the policy but queries the human teacher. The threshold is updated continuously, at every step, by function UpdateThreshold, to satisfy a user defined sensitivity value. Whenever there is a teacher input, the data is aggregated and the policy is updated using the function Train.
This work was supported by the European Union’s H2020 project Open Deep Learning Toolkit for Robotics (OpenDR) under grant agreement #871449 and by the ERC Stg TERI, project reference #804907.
@inproceedings{Luijkx2022partnr,
author = {Luijkx, Jelle and Ajanovi{\'c}, Zlatan and Ferranti, Laura and Kober, Jens},
title = {{PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive leaRning}},
booktitle = {{5th NeurIPS Robot Learning Workshop: Trustworthy Robotics}},
year = {2022},
}