Figure 2.4: Simon, at Georgia Tech, is one example of a robot designed with both learning and social interaction in mind. Techniques for making use of scaffolding, attention direction, transparency, and question asking are central to the development of this system.
These experiments quantifying question usage are closely related to HRI goals, and techniques integrating some of these principles into LfD will be discussed in Chapter 6.
2.4 IMPLICATIONS FOR THE DESIGN OF ROBOT LEARNERS
The human learning process serves as an inspiration in the design of social learning robots. By studying human learning we gain insights into the design of advanced learning systems. Furthermore, because learning from demonstration inherently requires interaction between the robot and the user, designing the interaction to conform to the user’s expectations leads to a more natural and effective learning process. The extent to which social elements need to be integrated into LfD often depends on the application. In some circumstances, the robot may benefit from the full range of social interactions, taking into account social cues such as gestures, gaze, direction of attention, and possibly even extending to affect. In other applications, minimal or no social understanding may be required, with the interaction instead limited to a human-computer interface. In all cases, the designers of the robot strive for the most natural, flexible, and efficient learning system for the given task. The following design elements are some that should be considered in the design of robots that learn from demonstration.
• Social interaction. Should the robot leverage the social aspect of the interaction? Would learning be aided if the robot understood the social cues of the user? Would learning be aided if the robot could exhibit social cues? Which social cues are most effective for LfD interactions? Which social cues, whether from the robot or teacher, are most informative for task learning, and which social cues are most preferred by users?
• Motivation for learning. Does the robot require intrinsic motivation for learning, or will all learning be initiated and directed by the human user?
• Transparency. To be effective, a teacher must be able to maintain an as accurate a mental model of the learner’s knowledge as possible. How can the robot externalize what it has learned and make elements of the internal model transparent to the user? What techniques for communicating the learner’s knowledge should be used to aid the learning process? Is it necessary that the communication techniques mimic the way humans communicate, or is it equally (or more) effective to leverage interfaces that are not part of natural human communication, such as screen-based devices?
• Question asking. Asking questions is a critical part of the human learning process. How does the robot effectively communicate the limits of its knowledge or pose a question? How can the user frame the answer in a way that the robot can understand, and how should the gained information be used to improve the underlying model? Many different types of questions can be designed, such as “what should I do now?” or “what is the intended goal?” Given multiple possible questions, how can the robot determine which questions to ask?
• Scaffolding. Just as for humans, complex tasks can be easier for machines to learn if they are broken down into simpler components. Organization of knowledge or skills into simpler parts also often allows for greater efficiency through reuse. How can the robot leverage scaffolding in its learning and interaction with the user? How can previously learned policies be built upon and reused in new settings? Note that in addition to simply saving learned policies, this could involve parameterizing the action space of the robot, allowing a previously learned skill (e.g., pick up box) to generalize to new objects or scenarios.
• Directing attention. Humans use a number of techniques to control the direction and scope of attention within a conversation. In the context of learning, both in the role of a teacher and a student asking a question, this skill is often used to focus learning, akin to feature selection in machine learning. How can control of attention be leveraged to simplify learning in complex domains? How can the robot direct the attention of the user, and vice versa? How does the learning algorithm respond to shifts in attention?
• Online vs. batch learning. The majority of traditional machine learning techniques make use of a batch learning process, examining all the training data at once and producing a model. Learning from demonstration can be cast as a batch learning process that occurs at the end of a training session, or once enough new demonstrations are acquired. However, it can also be viewed as an online learning process in which training data is acquired incrementally, similar to active learning. The choice between online and batch learning is important in the design of an interactive learning system as it will determine the flow of interaction and how new training data is acquired and integrated into the model.
As can be seen from this discussion, social learning mechanisms have the potential to play an important role in every part of the LfD process. In the next chapter, and the ones that follow, we switch to looking at LfD from a computational perspective, studying the Machine Learning techniques that can be applied to this problem. However, human involvement remains a critical factor in the discussed methods, and we return to this topic in Chapter 6, where we consider interactive techniques for policy refinement.
CHAPTER 3
Modes of Interaction with a Teacher
With insights from human social learning in mind, in this chapter we turn to a central design choice for every Learning from Demonstration (LfD) system: how to solicit demonstrations from the human teacher. As highlighted in Figure 3.1, this chapter forms the introduction to the technical portion of the book, laying the foundation for the discussion of both high-level and low-level learning methods. We do not entirely ignore the issues of usability and social interaction, after all, the choice of interaction method will impact not only the type of data available for policy learning, but also many of the topics discussed in the previous chapter (e.g., transparency, question asking, directing attention). However, these topics will remain in the background until Chapters 6 and 7, in which we discuss policy refinement and user study evaluation, respectively.
Figure 3.1: In this chapter, we discuss a wide range of techniques for collecting demonstration input for LfD algorithms.
In this chapter, we first introduce readers to the correspondence problem, which pertains to the differences in the capabilities and physical embodiment between the robot and user. We then characterize demonstration techniques under three general modes of interaction, which enable a robot to learn through doing, through observation, and from critique.
3.1 THE CORRESPONDENCE PROBLEM
An LfD dataset is typically composed of state-action pairs recorded during teacher executions of the desired behavior, sometimes supplemented with additional information. Exactly how demonstrations are recorded, and what the teacher uses as a platform for the execution, varies greatly across approaches. Examples range from sensors on the robot learner recording its own actions as it is passively teleoperated by the teacher, to a camera recording a human teacher as she executes the behavior with her own body. Some techniques have also examined the use of robotic teachers,