Rank-Based Methods for Shrinkage and Selection
With Application to Machine Learning
A. K. Md. Ehsanes Saleh Carleton University, Ottawa, Canada
Mohammad Arashi Ferdowsi University of Mashhad, Mashhad, Iran
Resve A. Saleh University of British Columbia, Vancouver, Canada
Mina Norouzirad Center for Mathematics and Application of NOVA University Lisbon, Lisbon, Portugal
This edition first published 2022
© 2022 John Wiley and Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of A.K. Md. Ehsanes Saleh, Mohammad Arashi, Mina Norouzirad, and Resve A. Saleh to be identified as the authors of this work has been asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging-in-Publication Data
ISBN 9781119625391
Cover image: [Production Editor to insert]
Cover design by [Production Editor to insert]
Set in 9.5/12.5pt STIXTwoText by Integra Software Services Pvt. Ltd, Pondicherry, India
We dedicate this book to
Shahidara Saleh
Reihaneh Soleimani, Elena Arashi
Lynn Hilchie Saleh
Abbas Ali Norouzirad, Fereshteh Arefian
Contents
1 Cover
8 Foreword
9 Preface
10 1 Introduction to Rank-based Regression1.1 Introduction1.2 Robustness of the Median1.2.1 Mean vs. Median1.2.2 Breakdown Point1.2.3 Order and Rank Statistics1.3 Simple Linear Regression1.3.1 Least Squares Estimator (LSE)1.3.2 Theil’s Estimator1.3.3 Belgium Telephone Data Set1.3.4 Estimation and Standard Error Comparison1.4 Outliers and their Detection1.4.1 Outlier Detection1.5 Motivation for Rank-based Methods1.5.1 Effect of a Single Outlier1.5.2 Using Rank for the Location Model1.5.3 Using Rank for the Slope1.6 The Rank Dispersion Function1.6.1 Ranking and Scoring Details1.6.2 Detailed Procedure for R-estimation1.7 Shrinkage Estimation and Subset Selection1.7.1 Multiple Linear Regression using Rank1.7.2 Penalty Functions1.7.3 Shrinkage Estimation1.7.4 Subset Selection1.7.5 Blended Approaches1.8 Summary1.9 Problems
11 2 Characteristics of Rank-based Penalty Estimators2.1 Introduction2.2 Motivation for Penalty Estimators2.3 Multivariate Linear Regression2.3.1 Multivariate Least Squares Estimation2.3.2 Multivariate R-estimation2.3.3 Multicollinearity2.4 Ridge Regression2.4.1 Ridge Applied to Least Squares Estimation2.4.2 Ridge Applied to Rank Estimation2.5 Example: Swiss Fertility Data Set2.5.1 Estimation and Standard Errors2.5.2 Parameter Variance using Bootstrap2.5.3 Reducing Variance using Ridge2.5.4 Ridge Traces2.6 Selection of Ridge Parameter λ22.6.1 Quadratic Risk2.6.2 K-fold Cross-validation Scheme2.7 LASSO and aLASSO2.7.1 Subset Selection2.7.2 Least Squares