• Relationships in object-relational databases
This chapter covers data modeling, the process of designing a dataset’s structure by adopting a set of abstractions representing the real world. A dataset is a collection of facts organized around entities. An entity is a group of similar things, each of which may be referred to as an instance or a member. For example, Road could be an entity representing all roads, with State Route 50, Interstate 10, Main Street, and Simpson Highway members of that entity. You cannot store the real-world entity in the dataset, so you store a set of descriptive attributes that allow you to identify the entity and understand its characteristics. Attributes can be composed of text, numbers, geometry, images, and other forms of data. If Road is your entity, then facility ID, route number, street name, length, jurisdiction, and pavement condition could be useful attributes. When an attribute involves location, it is considered to be spatial in nature. GIS involves spatial data. Attributes, not entities, determine whether a dataset is spatial.
A database is a dataset stored in an electronic medium. A geodatabase includes spatial data. A user acts upon such a dataset through a database management system, which may also provide various security and data integrity services. A geodatabase is a collection of geographic datasets. The database management systems used for large workgroup and enterprise geodatabases are relational, which means they perform according to a number of rules, called relational algebra, that describe how to read and write information stored in the database. The language shared by relational database management system (RDBMS) products is SQL, which once stood for Structured Query Language. The RDBMS converts SQL statements entered by the user (or generated by a computer application) into relational algebra to perform operations on the data. You do not need to know about RDBMS products, relational algebra, or SQL to do data modeling. What you do need to know is included in this chapter.
Every database is a data model because a model is simply an abstract representation of the real world. A primary concern of data modeling is deciding which abstraction to use. For example, a spatial database may represent a linear transportation facility with a centerline, but the real-world facility is actually an area with one very long axis. We commonly use a centerline because it conveys the primary aspect of the facility: it has length and traverses a space. That centerline can be part of a geometric network for determining the best path between two points, or it can simply be a reference for locating other features on a map.
The information you need about the facility is determined by how the data will be used. The network pathfinding application will need information about connectivity, cost of traveling on a segment, and restrictions to travel. Any geometric representation you created for the network may be highly abstract, perhaps just a straight line between two points. In contrast, a mapping application needs just a line geometry representation, with perhaps some information for symbolizing the line. The scale of display will determine the degree of abstraction allowed for the geometry. Large-scale maps may need detailed road edgelines, while small-scale maps may need only a generalized centerline.
You can alternatively represent the linear facility as a surface, such as might be done for a digital elevation model (DEM) using a triangulated irregular network (TIN), which are both ways to represent a surface for 3D representations, or it could be a set of pixels in a raster image. You might also store the linear facility as a set of address points. You can even store the facility as a set of nonspatial attributes, employing no geometry at all. Each of these abstractions has a place within a transport agency and its variety of spatial-data applications. However, this book will concentrate on vector data forms where lines represent linear facilities, as this is the most common abstraction. Several design proposals show how to accommodate multiple geometric representations for a single entity.
Your choice of which form of abstraction to use is determined by the data’s application. Since larger transportation organizations need many applications, it is likely they will need multiple abstractions. For example, a bridge might be a point feature to some, a linear feature to others, and a polygon feature to yet another group.
Data modeling is the structured process by which you examine the needs of your application and determine the most appropriate abstraction to use. It begins by understanding the application’s requirements for data, which will determine the appropriate level of abstraction, the structure to use in organizing the data, the entities to be created, and the attributes assigned to each entity. In the geodatabase, an entity eventually becomes a class, which is a discrete table or feature that you will define in terms of its properties, behaviors, and attributes. A geodatabase combines data with software in an object-oriented form that takes over much of the workload needed to use and manage the data. A geodatabase is an active part of the ArcGIS platform, not a passive holder of data.
Much more about these concepts is discussed later in this chapter. What you need to know for now is that data modeling, as presented here, is founded on the capabilities and constraints of the geodatabase. However, if you are like most transportation-data users, you already have data in a variety of nongeodatabase forms, so this chapter covers other fundamental data structures along with the basic concepts of database design and data models.
Data types
When starting a data-modeling project, you first must understand the data you intend to place into your new geodatabase. In addition to the geometry you use to abstractly represent a real-world entity cartographically, you have the traditional forms of data that have always been part of transport databases.
Figure 2.1 Data types The ArcGIS geodatabase supports several data types for user-supplied class attributes. The primary ones you will likely use are character string, short integer, long integer, single-precision floating-point (float), double-precision floating-point (double), and date.
One of the most common kinds of data is text, which consists of a string of alphanumeric characters, like letters, numbers, and punctuation. Anything you can type on a keyboard can go into a text field. The maximum number of allowable characters defines most text fields. For example, you might see a reference like “String (30)” to define a text field with a maximum length of 30 characters.
An equally popular form of data is a number. There are many different types of number data, but to the user they all consist of a series of digits. Where they differ is how they are stored in the database. In the geodatabase, a short integer will be stored using 2 bytes of memory; a long integer requires 4 bytes. A single-precision or floating-point number is also stored using 4 bytes, while a double-precision number is stored using 8 bytes. The actual numeric range that each of these forms represents varies according to the database management system you use.
Working in concert with the type of number format you select is the way you specify it in an ArcSDE geodatabase. A number field in such a database has two characteristics that go with its type. The first is precision, which specifies the maximum number of digits that can be stored. The second characteristic is scale, which tells the database how many of those digits will fall after the decimal point.
Number type, precision, and scale interact in various ways. For example, the database will ignore scale if you specify a number type of integer, because integers consist only of whole numbers. The data type overrides the specification.