A second difficulty in GIS mapping relates to the variability that occurs in time and space. Most data are collected as a snapshot in time. We have a more difficult task obtaining data over a span of time to reliably map changes or trends in the data. Furthermore, because many of the things that we may map—especially individual people—will move over time, there is an added dimension of analysis to consider. Do we locate a survey respondent based on her home address or her place of employment, or perhaps based on where the individual is most likely to be at a particular time of the day or week? This decision would be most significantly influenced by the question under study; there are no set answers.
Using computer animation, you can change map data from static to dynamic. However, this type of mapping is still limited by the difficulty and expense of collecting data at a high frequency (temporal scale) as well as by software limitations for incorporating data instantly as it is collected. Fortunately, only a few social science applications necessitate true real-time analysis. Your primary goal as a researcher considering GIS as an analysis tool is to make such decisions before collecting the data.
You also need to consider the spatial representation of your data. Often, in research, privacy is of the utmost concern. Researchers typically lump data to mask individual data points representing individual respondents. Lumping, or degrading, data in this fashion results in a serious trade-off: the true, raw data may be permanently lost and no longer available for future research. As a result, researchers may collect an enormous amount of redundant data when the simple recategorization of existing data in different but equally valuable combinations would have allowed them to explore different questions.
For example, say you are looking at the populations of 426 incorporated cities in California. The cities range in size significantly, from the city of Vernon (population 80) to Los Angeles (population 3.4 million). In examining these cities for research purposes, you should consider numerous methods of categorization. As an illustration, consider an example using five categories for city size, as in figure 2.2.
Figure 2.2 An example of categorical classifications for the size of cities. Size classes can be defined a variety of ways, depending on the objectives and preferences of the map author.
How you choose to organize your data into these categories will have a direct effect on the outcome of analysis. Optimally, you will have access to the actual numbers so that you have a choice in the matter. If not, the metadata should define how cities were assigned to each of these categories.
Typically, your GIS software will have default settings for categorizing and representing these data, as in figure 2.3, which shows portions of Los Angeles and Orange Counties in southern California. The data categorization is based on the defaults used in ArcGIS (a popular GIS software package produced by Esri)—five categories based on the natural breaks within the dataset.
Figure 2.3 A map of city populations symbolized using default settings in ArcGIS. This map illustrates the various population sizes of some southern California cities. Map by Steven Steinberg. Data from US Census and State of California.
Although using the defaults in your software may produce a nice map, they may not be appropriate to your study data and objectives. Therefore, it is important to understand and define data categories that make sense for your needs. Perhaps there are legal or regulatory definitions for the sizes of cities you should consider. Or there may be statistical justification for how you examine your data. Changing the categories, of course, changes the map and the analysis results.
The map in figure 2.4 retains the five categories from very small to very large but uses a geometric interval as the basis for the categorization. Notice how the distribution of city sizes appears differently on the map.
Figure 2.4 A map with city sizes symbolized using geometric categorization in ArcGIS. Map by Steven Steinberg. Data from US Census and State of California.
And finally, the map in figure 2.5 uses five quantiles, again changing the appearance and categorization of city sizes. Quantiles are a method of classification by which the data are divided into a specified number of equal-interval categories.
Figure 2.5 A map of city sizes symbolized using five quantiles of geometric categorization in ArcGIS. Map by Steven Steinberg. Data from US Census and State of California.
Although all of these examples are drawn from exactly the same dataset, they each represent the data differently. If you receive data that have been categorized in advance, you may find that the data are difficult or impossible to use in a study with a different set of questions. For example, what may be a medium-sized town to the person creating the original dataset may be a small town in your study. Another simple example of data degradation is the grouping of income levels into categories, which is a common practice in survey research. Categorical information, such as <$15,000 and $15,001–35,000, provide no means for a later study to distinguish individuals with incomes between $20,001 and $30,000. In a mapping context, it could be useful to link people or ideas to specific locations, but more commonly, data are collected by larger geographic regions, such as census blocks or other political boundaries; however, a census block doesn’t show the internal distribution of data in the census block (e.g., are the households equally distributed across the area, or is clustering of the households hidden in the simplified data?).
Where data are provided in categorical form using category definitions that do not meet your requirements, you may need to locate an alternative data source or even collect your own primary data. Data that are degraded can no longer be recategorized to explore new or different questions. Of course, these are not simple issues to address because anonymity is an essential component of many social science questions; however, to the extent possible, when data are maintained in near-original, detailed form, the possibilities for analysis both within and outside the GIS are much greater.
Expanding the G
Mapping attitudes, ideas, social networks (connections that exist between people), and countless other human constructs should be viewed as equally valid as mapping the latitude and longitude of a data point on the ground. Numerous opportunities, limited only by the creativity of the researcher, will allow GIS systems to extend into realms not envisioned by the traditional geographies originally programmed into the software. The question that remains to be addressed is how one can develop an appropriate mapping context to represent concepts such as social interaction, desirability of a community, or social ties. Developing an index value or relationship between data points that can be used in place of physical distance as traditionally mapped is one means for visualizing data in a mapping context. Visualizing data means being able to portray the data in a visual format.
The I in GIS
The