d. N = 5
26. Select the INFILE option that specifies that the raw data use a comma as the delimiter.
a. DLM = ‘,’
b. DLM = ,
c. DLM = COMMA
d. DLM = COMMA.
Short Answer
27. Discuss the advantages of using LIBNAME statements versus direct referencing for creating permanent SAS data sets.
28. Suppose that you inherit a program that reads data from the raw data file called NationalParks.dat into a permanent SAS data set called NATIONALPARKS. Would this cause SAS to overwrite the original raw data file?
29. Explain the reasons that you might choose to use internal versus external raw data.
30. Explain the difference between using a LIBNAME statement versus using an INFILE statement.
31. List five examples of data values that cannot be read with list input.
32. Write an INPUT statement for the following raw data with variables named Year, City, Name1, and Name2.
----+----1----+----2----+----3----+----4
18 San Diego Rebecca Marian
19 San Francisco Kathy Ginger
20 Long Beach Scott Sally
21 Las Vegas Cynthia MaryAnne
22 San Jose Ethan Frank
33. In the preceding data set some of the values for the variable City are longer than 8 characters. Explain why using a LENGTH statement with list input is not sufficient to read City correctly for this data set.
34. Describe one advantage of using formatted input over column input.
35. Write an INPUT statement for the following raw data with variables named Brand, Qty, and Amount.
----+----1----+----2----+----3
Pampers 42 $44.99
Huggies 7 $34.99
Seventh Generation 7 $39.99
Nature Babycare 4 $41.99
36. Explain why it would be a good idea to use an informat when reading data using the & modifier.
37. When reading raw data files, by default, the colon modifier cannot read character data with embedded blanks. Explain why and suggest a type of raw data file that would allow SAS to read embedded blanks using a colon modifier.
38. Examine the following raw data that contain the genus, species, and quantity of plants at a local nursery. Would a line pointer work to read this data file into SAS? Explain why or why not.
----+----1----+----2----+----3
Rosa
multiflora 49
canina 38
Narcissus
papyraceus 15
Dendrobium
kingianum 8
nobile 5
phalaenopsis 12
39. Examine the following raw data, which contain a patient ID and group designation (A, B, or C) with multiple observations per line. Write the SAS statements that will read the data into variables named ID and Group using a line-hold specifier, and then keep only those patients in groups A and C.
----+----1----+----2----+----3
4165 A 2255 B 3312 C 5689 C
1287 A 5454 A 6672 C 8521 B
8936 C 5764 B
40. Suppose that you have a raw data file from a national bank that contains millions of transactions from branches across the country. Reading in the entire data set takes too much processing time, and you are only interested in the records that correspond to your branch. Discuss how you can modify the following DATA step to decrease the processing time while reading this raw data file.
DATA transaction;
INFILE ‘c:\MyRawData\BankTrans.csv’ DLM = ‘,’;
INPUT Branch_Name Branch_ID Trans_ID Account
Date MMDDYY8. Start_Time TIME8.
End_Time TIME8. Amount Balance;
RUN;
41. Explain the difference between the TRUNCOVER and MISSOVER options for the INFILE statement.
42. Suppose that you have a raw data file that contains data values with embedded commas and uses tabs as a delimiter. Explain why it would or would not be necessary to enclose the data values in quotes and use the DSD option.
43. Write an INFILE statement that will tell SAS to read the raw data file c:\MyRawData\Records.csv, which contains data values that are separated by commas, and allows for missing data at the end of the record.
Programming Exercises
44. Annual attendance for the top 10 amusement parks in North America is listed in the raw data file ParkAttendance.dat. For each park, the data include the ranking, park name, location, and four years of attendance.
a. Open the raw data file ParkAttendance.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.
b. Use the IMPORT procedure to read the raw data file into SAS. View the log to verify that your data set has the same number of variables and observations as you stated in part a).
c. Print the data set.
45. The file CancerRates.dat contains data on the top 10 cancer sites in the United States from the Centers for Disease Control and Prevention (CDC) website. These statistics are condensed across genders and races. The variables are ranking, cancer site, and incidence rate per 100,000 people.
a. Open the raw data file CancerRates.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.
b. Read the raw data file into SAS. View the log to verify that your data set has the same number of variables and observations as you stated in part a).
c. Print the data set.
d. Copy the CancerRates.dat data set to a different location such as your desktop or a flash drive and read it into SAS a second time from that new location.
46. The American Kennel Club (AKC) reports rankings of dog breeds by year based on the number of registrations. These data are found in the raw data file AKCbreeds.dat. For each breed, the data include the name of the breed, and ranking for each of four years. Breeds with missing ranks were not recognized by the AKC during that year.
a. Open the raw data file AKCbreeds.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.
b. Read the raw data file into SAS. View the log to verify that your data set has the same number of variables and observations as you stated in part a).
c. Print the data set.
47. The World Health Organization (WHO) monitors vaccine recommendations in countries around the world. The raw data file Vaccines.dat contains the recommended vaccines for a sample of 13 countries. The variables in this file are vaccine name, mode of disease transmission, worldwide incidence, worldwide deaths, and recommendations (stored in 13 individual columns for the respective countries of Chile, Cuba, United States, United Kingdom, Finland, Germany, Saudi Arabia, Ethiopia, Botswana, India, Australia, China, and Japan).
a. Open the raw data file Vaccines.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.
b. Read the raw data file into SAS. View the log to verify that your