This edition first published 2021
© 2021 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Tamilvanan Shunmugaperumal to be identified as the author of this work has been asserted in accordance with law.
Registered Office John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office 111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Sadhu, Arup Kumar, author. | Konar, Amit, author.
Title: Multi-agent coordination : a reinforcement learning approach / Arup Kumar Sadhu, Amit Konar.
Description: Hoboken, New Jersey : Wiley-IEEE, [2021] | Includes bibliographical references and index.
Identifiers: LCCN 2020024706 (print) | LCCN 2020024707 (ebook) | ISBN 9781119699033 (cloth) | ISBN 9781119698999 (adobe pdf) | ISBN 9781119699026 (epub)
Subjects: LCSH: Reinforcement learning. | Multiagent systems.
Classification: LCC Q325.6 .S23 2021 (print) | LCC Q325.6 (ebook) | DDC 006.3/1--dc23
LC record available at https://lccn.loc.gov/2020024706 LC ebook record available at https://lccn.loc.gov/2020024707
Cover design: Wiley
Cover image: © Color4260/Shutterstock
Preface
Coordination is a fundamental trait in lower level organisms as they used their collective effort to serve their goals. Hundreds of interesting examples of coordination are available in nature. For example, ants individually cannot carry a small food item, but they collectively carry quite a voluminous food to their nest. The tracing of the trajectory of motion of an ant following the pheromone deposited by its predecessor also is attractive. The queen bee in her nest directs the labor bees to specific directions by her dance patterns and gestures to collect food resources. These natural phenomena often remind us the scope of coordination among agents to utilize their collective intelligence and activities to serve complex goals.
Coordination and planning are closely related terminologies from the domain of multi‐robot system. Planning refers to the collection of feasible steps required to reach a predefined goal from a given position. However, coordination indicates the skillful interaction among the agents to generate a feasible planning step. Therefore, coordination is an important issue in the field of multi‐robot coordination to address complex real‐world problems. Coordination usually is of three different types: cooperation, competition, and mixed. As evident from their names, cooperation refers to improving the performance of the agents to serve complex goals, which otherwise seems to be very hard for an individual agent because of the restricted availability of hardware/software resources of the agents or deadline/energy limits of the tasks. Unlike cooperation, competition refers to serving conflicting goals by two (team of) agents. For example, in robot soccer, the two teams compete to win the game. Here, each team plans both offensively and defensively to score goals and thus act competitively. Mixed coordination indicates a mixture of cooperation and competition. In the example of a soccer game, inter‐team competition and intra‐team cooperation is the mixed coordination. Most of the common usage of coordination in robotics lies in cooperation of agents to serve a common goal. The book deals with the cooperation of robots/robotic agents to efficiently complete a complex task.
In recent times, researchers are taking keen interest to employ machine learning in multi‐agent cooperation. The primary advantage of machine learning is to generate the action plans in sequence from the available sensory readings of the robots. In case of a single robot, learning the action plans from the sensory readings is straightforward. However, in the context of multi‐robot, the positional changes of the other robots act as additional inputs for the learner robot, and thus learning is relatively difficult. Several machine learning and evolutionary algorithms have been adopted over the last two decades to handle the situations. The simplest of all is the supervised learning technique that requires an exhaustive list of sensory instances and the action plan by the robots. Usually, a human experimenter provides these data from his/her long acquaintance with such problems or by direct measurement of the sensory instances and decisions. The training instances being too large, sometimes has a negative influence to the engineer, and he/she feels it uncomfortable not to miss a single instance that carries valuable mapping from sensory instance to action plan by the robots.
Because of the difficulty of generating training instances and excessive computational overhead to learn those instances, coupled with the need for handling dynamic situations, researchers felt the importance of reinforcement learning (RL). In RL, we need not provide any training instance, but employ a critic who provides a feedback to the learning algorithm about the possible reward/penalty of the actions by the agent. The agent/s on receiving the approximate measure of penalty/reward understands which particular sensory‐motor instances they need to learn for future planning applications. The dynamic nature of environment thus can easily be learned by RL. In the multi‐agent scenario, RL needs to take care of learning in joint state/action space of the agents. Here, each agent learns the sensory‐motor instances in the joint state/action space with an ultimate motive to learn the best actions for itself to optimize its rewards.
The superiority of evolutionary algorithms (EAs) in optimizing diverse objective functions is subjected to the No Free Lunch Theorem (NFLT).