Excel Files with Invalid SAS Variable Names
Introduction
If you are reading this book, you are probably familiar with various statistical techniques but might not have used SAS to analyze data. The primary purpose of this book is to show you how to use SAS to perform a variety of statistical tasks. To that end, this book provides examples of many of the commonly used statistical techniques. Following each example is a discussion of the output. Although this is not a book about SAS programming, many of the examples require some data manipulation tasks, which will be described. If you need to gain more SAS programming skills, see Learning SAS by Example: A Programmer’s Guide, also by this author and published by SAS Press.
This book is divided into five sections: An Introduction to SAS, Descriptive Statistics, Inferential Statistics, Power/Sample size calculations, and Selecting Random Samples.
All of the programs and data files in this book are available from SAS Press. To download these programs and files, go to http://support.sas.com/authors.
If you already have some familiarity with SAS data sets and how to run SAS programs, you can skip this chapter and start right in with Chapter 2.
The remainder of this chapter describes what SAS is, the basic structure of SAS programs, how to access some simple data sets, and how to run a SAS program on a Windows platform.
What is SAS
SAS (pronounced sass) is a collection of programs that are used to read data from a variety of sources (text files, Excel workbooks, various databases, etc.), to manipulate data with a very powerful programming language, and to perform various reporting and data analysis tasks. To run all of the examples in this book, you will need access to Base SAS, SAS/STAT, and SAS/GRAPH software.
SAS runs on many different computing platforms: PCs, UNIX and Linux operating systems, and mainframe computers. You can run the examples in this book on any of these platforms, although there are some small differences in how the data are accessed on the various systems. The examples in this book were all run on a PC, because this is by far the most popular platform on which SAS is run.
Statistical Tasks Performed by SAS
SAS, like many other statistical packages, has built-in procedures for analyzing data. These procedures (called PROCs for short) enable you to perform statistical tests and analyses. For example, if you want to perform a Student’s t-test, you use PROC TTEST, to run a regression model, you use PROC REG. The list of statistical PROCs is quite extensive, and not all of them are covered in this book. Even so, this book is a good place to get started, and many of the most popular statistical tasks are covered. The complete guide to all SAS/STAT procedures is available from SAS Institute. The documentation is available in more than five volumes (taking up about two feet of shelf space) or for free in HTML or PDF form on the SAS Web site: http://support.sas.com/documentation. Select SAS Press books are also available on iPad, Kindle, Google eBooks, Mobipocket, Books24x7, netLibrary, and Safari Books online.
The Structure of SAS Programs
SAS programs are divided into DATA and PROC steps. With DATA steps you can read text data from files; create new variables from existing variables; perform logical operations on your data; and merge, concatenate, and subset your data files. PROC steps give you the ability to perform pre-defined tasks such as creating frequency distributions or performing a t-test. SAS stores data in data sets that are unique to SAS. Your data might already be in a SAS data set, in which case you might not need to write a DATA step at all. However, even if you are starting with data already in a SAS data set, you might want to write a DATA step to perform some manipulation of the data, such as performing a transformation, grouping values, subsetting your data, or combining data from several data sets.
SAS Data Sets
SAS data sets consist of two parts: a descriptor portion, or metadata (information about your data set, such as the variable names and data types), and the data values themselves. SAS can create SAS data sets from almost any source. If you have raw data in a text file, you can use a DATA step to read the file and create your SAS data set. If you have an Excel workbook, or any one of several popular database formats, you can either use a SAS procedure to convert the data into a SAS data set or use the Import Wizard (part of SAS/ACCESS and available from SAS Institute) to point-and-click your way through the conversion.
SAS Display Manager
This section shows you how to run SAS programs on a Windows platform. If you are using UNIX or a mainframe to run your programs, your screens will not look like the images shown in this section. If you choose to use SAS Enterprise Guide to run your SAS programs on a Windows platform, you will also be using a different editor.
When you open SAS on a PC platform, you enter SAS Display Manager. This facility contains three major windows: the Program Editor, where you write, edit, and submit your SAS programs; a Log window, where you see error messages and information about the program you have submitted; and the Output window, where SAS displays your output. If you need the output for a Web page or you want to use the output in a Word or Word Perfect program, the Output Delivery System (ODS) can send your output to these destinations as HTML, RTF (rich text format), or PDF.
The following display shows SAS Display Manager with all three windows open:
The top section is the Output window, the middle section is the Log window, and the bottom section is the Program Editor window. You can resize, move, or expand each of these windows.
Excel Workbooks
Because Excel workbooks (or comma-separated values [CSV] files) are so popular on PCs, let’s examine how to use the SAS Import Wizard to convert these files into SAS data sets.
The following display shows an Excel workbook called SAMPLE.XLS:
Each of the columns in this workbook contains information about each of our subjects (ID, Age, and Gender). The first row of the workbook contains variable names. In Excel, these names can be anything. In SAS, variable names must conform to a stricter naming convention. The maximum length of a SAS variable name is 32 characters; the first character of a variable name must be a letter or an underscore, and the remaining characters of the variable name can contain letters (uppercase or lowercase), numbers, and the underscore character. For example, Age, Income2010, and Home_Runs are valid SAS variable names; 1year, Year Income, and Cost% are not valid SAS variable names. Later you will see what happens if the first row of your workbook contains variable names that are not valid SAS variable names.
By the way, SAS variable names are not case sensitive. However, if you use uppercase, lowercase, or mixed case, SAS remembers the case of the variable name from the first time you used it and displays the names in SAS reports based on the previous value.
Each of the rows of the workbook, with the exception of the first row, contains data about an individual. SAS calls these rows observations. So, whereas an Excel workbook has columns and rows, SAS data sets have variables and observations.
To convert this Excel file into a SAS data set, you can use the Import Wizard:
1. Click File.