Practical Data Analysis with JMP, Third Edition. Robert Carver. Читать онлайн. Newlib. NEWLIB.NET

Автор: Robert Carver
Издательство: Ingram
Серия:
Жанр произведения: Программы
Год издания: 0
isbn: 9781642956122
Скачать книгу
with data organized into tables. The columns of the tables contain variables (for example, year, name, price) and the rows of the tables represent the individual items in the sample.

      One of the organizing principles that you will notice in JMP is the differentiation among data types and modeling types. Most columns that you will work with in this book are all either numeric or character data types, much like data in a spreadsheet are numeric or labels. JMP has two other major data types—Row States and Expressions—to be discussed later.

      In your statistics course, you might be learning about the distinctions among different types of quantitative and qualitative (or categorical) data. Before we analyze any data, we will want to understand clearly whether a column is quantitative or categorical. JMP helps us keep these distinctions straight by using different modeling types. In the first several chapters, we will work with three modeling types:

      ● Continuous columns are inherently quantitative. They are numeric so that you can meaningfully compute sums, averages, and so on. Continuous variables can assume an infinite number of values. Most measurements and financial figures are continuous data. Estimated average life expectancies (in years) are continuous.

      ● Ordinal columns reflect attributes that are sequential in nature or have some implicit ordering (for example, small, medium, large). Ordinal columns can be either numeric or character data.

      ● Nominal columns simply identify individuals or groups within the data. For example, if we are analyzing health data from different countries, we might want to label the nations and/or compare figures by continent. With our Life Expectancy 2017 data, both the names of countries and their continental regions are nominal columns. Nominal variables can also be numeric or character data. Names are nominal, as are postal codes or telephone numbers.

      As we will soon see, understanding the differences among these modeling types clarifies how JMP treats our data and presents us with choices.

      Figure 1.1: The JMP Opening Screen

Figure 1.1 Some JMP Help Options

      JMP provides an extensive set of tutorials for users that illustrate many of the features of the software. Readers are encouraged to investigate the tutorials on their own. Find the full list of tutorials in the Help menu.

      You will also see the JMP Starter window, which is an annotated menu of major functions. It is worth your time to explore the JMP Starter window by navigating through its various choices to get a feel for the wide scope of capabilities that the software offers. As a new user, though, you might find the range of choices to be overwhelming.

      In this book, we will tend to close the JMP Starter window and use the menu bar at the top of the screen to make selections. Finally, look at the JMP Home Window. The home window is divided into two panes that can help you keep track of recently used files and currently open windows. You can customize this view, but this book shows the standard two-pane layout.

      In this book, we will most often work with data that has already been entered and stored in a file, much like you would type and store a paper in a word-processing file or data in a spreadsheet file. In Chapter 2, you will see how to create a data table on your own.

      We will start with the U.N. life expectancy data mentioned earlier. Within the Home Window, do this:

      1. Click File ► Open.

      3. Select the file called Life Expectancy 2017 and click Open.

      The data table appears in Figure ‎1.2. Notice that there are four regions in this window including three vertically arranged panels on the left, and the data grid on the right.

      Figure 1.2: The Life Expectancy 2017 Data Table

Figure 1.1 Some JMP Help Options

      The three panels provide metadata (descriptive information about the data in the table), which is created at the time the data table was saved and can be altered. At this early stage, it is helpful to understand the purpose of each panel.

      Beginning at the top left, we find the Table panel, which displays the name of the data table file as well as optional information provided by the creator of the table. You will see a small red triangle pointing downward next to the table name.

      Red triangles indicate a context-sensitive menu, and they are an important element in JMP. We will discuss them more in later chapters, but you should expect to make frequent use of these little red triangles.

      Just below the red triangle, there is a note describing the data and identifying its source. You can open that note (called a Table variable) just by double-clicking on the word “Credit,” the first line within the Table panel. Figure ‎1.3 shows the note for this table. A table variable contains metadata about the entire table.

      Figure 1.3: Table Variable Dialog Box

Figure 1.1 Some JMP Help Options

      The second and third lines of the Table panel include a green arrow. Green arrows indicate that there is a script embedded in the data table. In this case, the script lists the steps to extract this set of data from a much larger data table called WDI, and one can reproduce the subsetting process by running the script. We will use the full WDI data table in future chapters.

      Below the Table panel is the Columns panel, shown in Figure 1.4, which lists the column names, JMP modeling types, and other information about the columns.

      Figure 1.4: The Columns Panel

Figure 1.1 Some JMP Help Options

      There are several important things to notice in the Columns panel. The notation (5/0) at the top of the panel tells us that there are five columns in this data table, and that none of them are selected at the moment. In a JMP data table, we can select one or more columns or rows for special treatment, such as using the label property in the second, third, and fourth columns so that country names, regions, and the year will be displayed within graphs. There is much more to learn about the idea of selection and column properties, and we will return to it later in this chapter.

      The panel lists the columns by name. To the left of the names are icons indicating the modeling type. In this example, the first three red icons (these look like bar charts) identify Country Code, Country Name, and Region as nominal data. The “price tag” icons indicate that these variables can act as labels to specifically identify observations that are displayed in a graph.

      The green ascending bar icon next to Year indicates that year is to be analyzed as an ordinal variable. In this data table, all observations are from the same year, 2017, but in the original data set, we have annual observations from 1990 through 2018. Hence, this is an ordinal variable.

      Finally, the blue triangle next to life_exp identifies the column as continuous data. Remember, it makes sense to perform calculations with continuous data.

      At