Because you used an ID statement, this portion of the output includes the Subj variable. The column labeled Obs is the observation number (which is not very useful because adding observations or sorting the data set will change the observation number). If you want to see more than the five lowest and five highest values, you can supply a procedure option NEXTROBS=n (number of extreme observations) to ask PROC UNIVARIATE to list any number of extreme observations.
The HISTOGRAM and PROBPLOT statement both produce high quality SAS/GRAPH output. Depending on your system, these plots are either displayed immediately in your output window, or you need to click on the task bar at the bottom of your screen to see them. The following graph is the result of the HISTOGRAM statement:
The x-axis shows ranges of SBP. The numbers that are displayed are the midpoints of the SBP ranges. The y-axis displays the percentage of values that fall within these ranges. In the next section, you will learn how to change these data ranges, but the values that SAS chooses for you are usually fine for a quick idea of what your distribution looks like. In this example, the SBP values look similar to those from a normal distribution.
The PROBPLOT statement produced the next graph:
If your values came from a normal distribution, they would fall close to the diagonal line on the plot. In this example, the actual data points do not deviate much from this theoretical line, showing that the values of SBP come from a distribution that is close to normal. This outcome is also consistent with the values for skewness and kurtosis that you saw earlier.
Changing the Midpoint Values on the Histogram
If you want to change the midpoint values displayed on the histogram, you can supply a MIDPOINTS option on the HISTOGRAM statement. For example, if you want midpoints to go from 100 to 170 with each bin representing 5 points, you would write:
histogram / midpoints=100 to 170 by 5;
The following histogram used the MIDPOINTS option set to 100 to 170 by 5:
Finally, you could also see a theoretical normal curve superimposed on your histogram by including the NORMAL option on the HISTOGRAM statement like this:
histogram / midpoints=100 to 170 by 5 normal;
The output now shows a normal curve superimposed on your histogram:
Generating a Variety of Graphical Displays of Your Data
SAS 9.2 introduced several important and useful statistical graphics procedures. Among the more useful of these are SGPLOT and SGSCATTER. You can use SGPLOT to produce histograms, box plots, scatter plots, and much more. SGSCATTER displays several plots on a single page (including a scatter plot matrix that is particularly useful). The SG procedures come with a number of built-in styles. You can select different styles for your output without having to do any programming.
Let’s see how to produce a histogram and a box plot using SGPLOT.
Program 2.6: Using PROC SGPLOT to Produce a Histogram
title “Using SGPLOT to Produce a Histogram”; proc sgplot data=example.Blood_Pressure; histogram SBP; run; |
This HISTOGRAM statement produces a histogram, similar in appearance to the histogram you obtained with the HISTOGRAM statement on PROC UNIVARIATE. As you will learn later, you can change the appearance of the output when you select alternate output destinations such as HTML, PDF, and RTF (rich text format), and one of the built-in styles.
First, let’s see how to display the plot. Then you will learn a few of the more popular options that control the appearance of the output.
Output from the SG procedures does not usually open automatically after you run the procedure. One way to examine the output is to go to the Results window in SAS Display Manager:
You see the output from SGPLOT with a plus sign (+) to the left of it. Click the plus sign to expand the list:
Now double click on the SGPlot Procedure icon to display the histogram. You can use this sequence of steps to display any of the graphs produced by the SG procedures or to display the plots produced by ODS Statistical Graphics that you will see later in this book.
Finally, after all this clicking, you will see your histogram:
To produce a box-plot of the same data, use the HBOX statement (horizontal box plot) instead of the request for a histogram:
Program 2.7: Using SGPLOT to Produce a Horizontal Box Plot
title “Using SGPLOT to Produce a Box Plot”; proc sgplot data=example.Blood_Pressure; hbox SBP; run; |
Click your way through the Results window to see the following display:
The left and right sides of the box represent the 1st and 3rd quartiles (sometimes abbreviated Q1 and Q3). The vertical bar inside the box is the median, and the diamond represents the mean. The lines extending from the left and right side of the box (called whiskers) represent data values that are less than 1.5 times the interquartile range from Q1 and Q3. If you prefer to see a vertical box plot, use the keyword VBOX instead of HBOX.
To see the effect of outliers on a box plot, let’s modify two SBP values for subjects 5 and 55 to be 200 and 180, respectively. This modified data set is called Blood_Pressure_Out and is stored in the Work library (making it a temporary SAS data set). You can see the program to create this data set, as well as the request for the box plot, in Program 2.8:
Program 2.8: Displaying Outliers in a Box Plot
*Program to make a temporary SAS data set Blood_Pressure_Out that contains two outliers, one for Subj 5, one for Subj 55; data Blood_Pressure_Out; set example.Blood_Pressure(keep=Subj SBP); if Subj = 5 then SBP = 200; else if Subj = 55 then SBP = 180; run; title “Demonstrating How Outliers are Displayed with a Box Plot”; proc sgplot data=Blood_Pressure_Out; hbox SBP; run; |
The SET statement is an instruction to read each of the observations from data set example.Blood_Pressure. In parentheses following the data set name is a KEEP= data set option. This option tells the program that you want only two of the variables