String Objects
Even though string objects in Python are more basic data types, they have some similarities to List objects. Strings are sequences of characters, and we can access each character of a string individually, just as we can access each object in a list, using an index. A character is the smallest possible component of a text, and characters vary depending on the language and context used (Python Software Foundation, 2019, “Unicode HOWTO”). For example, if a variable str_message had the value “Hey, Taxi!” it would comprise 10 characters and you could index each character in the string using the index values 0 through 9. In this example, str_message[2] would be the third character, which is “y.”
String Methods
Like lists, strings also have many methods. Table 3.2 identifies and gives a description of some commonly used string methods. A complete list of Python string methods is available in the Python documentation (Python Software Foundation, 2019, “String Methods”).
Table 3.2
Unlike lists, strings are immutable, meaning that their values are not able to be modified. The methods listed in Table 3.2 all return values, which in some cases are copies of the string on which we are applying the method. Note that even though we cannot modify a string, we can replace it with another string. Therefore, we can assign the returned copy from the use of a string method back to the string, replacing it.
String Operations
In addition to string methods, there are several operations that we can perform on or with strings. One such operation is slicing, which accesses a portion of a string by referencing a starting index and an ending index value. Slicing is important to learn because we are often interested in text-based information that is within larger amounts of text. We also use slicing with lists to reference a specific portion of the list. The syntax of the slicing operation is object[start_index:end_index], and a specific detail to be noted is that the slice returns the elements beginning at the start_index position through the (end_index – 1) position (not the end_index position). Using our str_message variable from above, which has the value “Hey, Taxi!” the slice of that string from the second character to the seventh character is str_message[1:7], which is “ey, Ta.” Another string operation is concatenation, which combines multiple strings into one string. We perform string concatenation using the “+” operator, so we can concatenate the two strings “Hey,” and “Taxi!” and assign the result to a variable using the statement: variable_name = “Hey,” + “Taxi!” We illustrate both examples in Figure 3.7, and the resulting output is in Figure 3.8.
Figure 3.7 Python String Operations
Figure 3.8 Output from Execution of Python String Operations
Note that the result of the concatenation in line 8 did not have a space between the “Hey,” and “Taxi!” substrings. One way to insert a blank in the middle of the string is to concatenate the two parts with a space in between. Line 11 in Figure 3.7 accomplishes this by concatenating a blank with the first part and the last part of the string that did not have a blank. The slice operation in the last part of line 11 does not specify an ending index value, which results in the slice going all the way to the end of the string. Because string objects are immutable, there is not an .insert() method (or any other method or operation that would change the string) like there is for list objects.
Referring to Table 1.2 and Table 1.3 from Chapter 1, there are several fields in the Taxi Trips data set that involve taxi trip cost-related information (fare, tips, extras, and trip_total). These fields have the data type “Money” in the SODA API, but there is not a Python basic data type money. If we assign the value $4.75 to a variable, an error will occur, because the Python interpreter doesn’t recognize that usage of a dollar sign symbol. To address this using built-in functions, we assign the trip-cost-related information to different variables as string values, and then we remove the dollar sign symbol from each string. We then convert the string values to float data types using the built-in function float.
Figure 3.9 Python Code to Add Up Trip Costs
Figure 3.9 incorporates several features that we have discussed in this chapter. As mentioned earlier, the trip cost components have a dollar sign as part of their values, so we cannot assign them to basic numeric Python data-type objects. Lines 3, 4, and 5 assign the values as strings to variables. Next, we use the slicing operation in lines 8, 9, and 10 to remove the first character of each of the strings. Note that because strings are immutable, we cannot change them directly, but in these operations, we are assigning the modified (sliced) string back to the original variable, replacing it. Line 13 uses the float built-in Python function to convert each of the trip cost components to float values and adds these converted values together and assigns the result of the addition to the trip_total variable. The trip_total variable is a float data-type object, and we convert it to a string in line 14 (using the Python built-in function str) to concatenate it with a dollar sign to report back to the user. Figure 3.10 illustrates the output from executing this Python code.
Figure 3.10 Output from Execution of Python Code to Add Up Trip Costs
SCU 3.3 String Operations
Download the file “SCU 3_3.py” from the companion website and save it either on your computer or on a removable storage device. Open the file in the Python IDLE editor and add a line of code that uses the .upper() method to convert the string my_string to uppercase. Execute the modified program after the change to verify that the revised code runs and produces the correct result.
Lessons learned: In this section, we learned about Python string operations, which are very useful when working with portions of strings. We learned that strings are immutable, so we are not able to modify portions of strings in place, but we are able to replace strings by assigning portions of a string or combinations of string portions using concatenation.