Most database management systems also support date and time data types. Although stored in vendor-specific ways, ArcGIS provides a consistent representation to the user in which date and time are combined into one data type, called ‘Date’. The date portion is provided as a two-digit month, a two-digit day, and a four-digit year, with the three components separated by a forward slash character. The time portion is presented as a two-digit number representing a 24-hour clock (00-23), a two-digit minute portion (00-59), and a second component with a precision of 5 and a scale of 3. The three time components are separated using colons.
Files
Relational databases were not the first kind of electronic data structure. The oldest form of database storage is the file, which consists of a block of data organized into logical groups called fields. Each position in the field is called a column. Files look like a table with their records (rows) that separate content using a special character to signify the end of a logical group of data. Everything is text. There is no inherent requirement for all the records to have the same structure. For example, the first record, often called a header, could state the number of body records or describe the fields in those records. All the intelligence needed to understand the file’s content is in the application that reads and writes records.
Figure 2.2 Files A fixed-length file uses column position to identify specific data content forming attributes. A variable-length file uses the sequential order of fields separated by a predefined special character—one that cannot appear in the data. In both cases, the application using the data must know the specific location of each piece of information.
Files come in two basic forms: fixed-length and variable-length. A fixed-length file uses the position of each character in a record to interpret its meaning. Any leftover space not needed to store the data for that record is filled with spaces, either before or after the actual data in the field. Fields in each record are identified by position. For example, a file specification may declare that record characters (columns) 1 through 47 contain an employee’s name right-justified with leading spaces.
A variable-length file uses the position of a field within the record to identify its content. Variable-length records avoid space filling by using special characters to say where one field stops and another begins. You may have come across this structure when using comma-and tab-delimited text files. The commas or tab characters are the things that separate the records into fields. Usually, there is also a special end-of-file character.
The most common ArcGIS file-based data structure is the shapefile. A shapefile is a kind of spatial database structure consisting of several files. There are more than a hundred recognized shapefile component types, each with its own file extension (the three characters after the dot in a typical file name). To copy a shapefile, you must copy all the component files. The minimum components are the geometry (.shp), the nonspatial attribute data (.dbf), and the spatial index (.sbx). The structure of each component file is optimized for the information it contains. For example, the geometry file (.shp) contains a 100-byte fixed-length file header followed by variable-length records. The variable-length record is composed of an 8-byte, fixed-length record header followed by variable-length record contents. Each record defines a single geometry, with the length of the variable portion being determined by the number of vertices and whether measure (m) and elevation (z) coordinate values are included. The fixed-length record header portion provides a record number and the length of the variable portion.
Coverages, which were the original ESRI data structure, are also based on a database structure consisting of multiple files. Designed to reduce the size of a spatial database, software manipulating coverage data must manage a number of composition relationships inherent in the file structure. A special data-exchange file type was developed to be able to distribute coverages via a single file.
File data structures remain useful today and will continue to be part of GIS datasets long into the future. This book, however, will restrict itself to modeling geodatabases. What you put into and take out of a geodatabase may be a file, but the database to be modeled is a geodatabase.
Tables
The next step along the evolutionary line of database design is the table, which is a fundamental data organization unit of a relational database. A table looks very much like a file in the way it is presented to the user by the RDBMS in which it exists: a set of rows (records) and columns (fields). Rows represent members and columns represent attributes.1 However, you cannot simply copy a table as you can a file, because each table is tightly bound to the RDBMS.
Figure 2.3 Tables An application seeking to use the data stored in a relational table needs to know the name of the table and the name of each attribute it seeks, but not the physical manner of data storage. That job is performed by the RDBMS. The primary key uniquely identifies each row. One or more foreign keys can be established to provide connections to other tables. In this example, the Position Number attribute serves as a foreign key to a table storing position descriptions, where Position Number is the primary key. Foreign keys express association relationships. Cardinality is the ratio of rows for two tables. The number that comprises each half of the ratio is the table’s multiplicity. The cardinality of this one-to-many (1:m) association relationship says that a position number must be entered for each employee and that some position numbers may not be applicable to any employee.
Dr. Edgar Codd invented relational databases in the early 1970s at IBM, although it was several years later before a working product could be devised. Such a database management system is based on relational algebra, a kind of math that controls what can happen to data in such a storage structure. Relational algebra supports seven functions2:
• Retrieve (read) row
• Update (write) row
• Define virtual relations (table views)
• Create a snapshot relation
• Define and implement security rules
• Establish and meet stability requirements
• Operate under integrity rules
Relational tables are not actually stored in the row-and-column form we typically use to visualize them, but everything you need to take from this book can be accommodated with the rows-and-columns metaphor. Oracle, SQL Server, Sybase, and Informix are commonly used RDBMS platforms. Products like Microsoft Access have much of the functionality of an RDBMS but are actually database management systems that employ files.
Relationships in relational databases
The big advance offered by the relational database is its ability to represent and manage relationships between tables. Where files normally use a record’s position in the file to uniquely identify each member, an RDBMS cannot impose any ordering on its member records. Thus, an RDBMS requires that at least one column be an instance identifier, called a primary key.
The relationship that relational databases are most concerned about is the association of one table to another. An association is established by placing the same column or a set of columns in both tables. This connection is called a foreign key. For example, a foreign key may link a central table storing general roadway information with other tables containing information about speed limit, traffic volume, maintenance jurisdiction, and pavement condition.