Maintaining Mission Critical Systems in a 24/7 Environment
Third Edition
Peter M. Curtis
Copyright © 2021 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging‐in‐Publication Data:
Names: Curtis, Peter M., author. | John Wiley & Sons, Inc., publisher.
Title: Maintaining mission critical systems in a 24/7 environment / Peter M. Curtis.
Other titles: IEEE Press series on power engineering.
Description: Third edition. | Hoboken, New Jersey : Wiley‐IEEE Press, [2021] | Series: IEEE Press series on power engineering | Includes bibliographical references and index.
Identifiers: LCCN 2020038739 (print) | LCCN 2020038740 (ebook) | ISBN 9781119506119 (cloth) | ISBN 9781119506126 (adobe pdf) | ISBN 9781119506140 (epub)
Subjects: LCSH: Reliability (Engineering).
Classification: LCC TA169 .C87 2021 (print) | LCC TA169 (ebook) | DDC 620/.00452–dc23
LC record available at https://lccn.loc.gov/2020038739 LC ebook record available at https://lccn.loc.gov/2020038740
Cover Design: Wiley
Cover Images: © Sam Robinson/Getty Images, courtesy of Peter M. Curtis
Foreword
Our lives, livelihoods, and way of life are increasingly dependent on computers and data communication, and this dependence increasingly relies on data centers, where servers, mainframes, storage devices, and communication gear are brought together. In short, we are becoming a datacentric or data center society.
We are all witnessing the extraordinary expansion of the Internet, from social media to search engines, games, content distribution, and e‐commerce. The advent of cloud computing, artificial intelligence and machine learning, virtual and augmented reality, blockchain and Internet of Things will further amplify the importance of the data center as a hub for the new wired world. As we enter the Fourth Industrial Revolution, every aspect of civilization, every region of the globe, will see an acceleration of our society’s Digital Transformation. The COVID‐19 pandemic of 2020 has further reinforced the extent our world is relying on technology. Consequently, there is an ever‐increasing demand on our information infrastructure, especially our data centers, changing the way we design, build, use, and maintain these facilities. However, the industry experts have been slow to document and communicate the vital processes, tools, and techniques needed to do this.
Not only is ours a dynamic environment, but it also is complex and requires an understanding of electrical, mechanical, fire protection, and security systems, reliability concepts, operating processes, and much more. I realized the great benefit Peter Curtis' book will bring to our mission critical community soon after I started reviewing the manuscript. I believe this is the first attempt to provide a comprehensive overview of all the interrelated systems, components, and processes that define the data center space, and the results are remarkable.
Data center facilities are shaped by a paradox: critical infrastructure support systems and the facilities housing them are designed to last 15 years or more, whereas the IT equipment typically has a life of about three years. Thus, every few years, we are faced with major IT changes that dramatically alter the computer technology and invariably impact the demand for power, heat dissipation, and the physical characteristics of the facility design and operation. In addition, the last few years have seen a growing focus on energy efficiency and sustainability, reflecting society's effort to reduce its carbon footprint and reverse global warming. Data centers are particularly targeted by these efforts because they are such huge users of power and because they are scrutinized by the public more than most other sectors.
It is no secret that one of the most difficult challenges facing our industry is our ability to objectively assess risk and critical facility robustness. In general, we lack the metrics needed to quantify reliability and availability, the ability to identify and align the function or business mission of each building with its performance expectation.
Other industries, particularly aircraft maintenance and nuclear power plants, have spent years developing analytical tools to assess systems resiliency, and the work has yielded substantial performance improvements. In addition, the concept of reliability is sometimes misunderstood by professionals serving the data center industry. Curtis' efforts to define and explain reliability concepts will help improve performance in the mission critical space.
Further, the process of integrating all of the interrelated components‐programming space allocation, design, redundancy level planning, engineered systems quality, construction, commissioning, operation, documentation, contingency planning, personnel training, and so on‐to achieve reliability objectives is clear and well‐reasoned. The book plainly demonstrates how and why each element must be addressed to achieve reliability