Report a Bug: Takes you to a page where you can report Colab errors.
Ask a Question on Stack Overflow: Displays a new browser tab, where you can ask questions from other users. You see a login screen if you haven’t already logged in to Stack Overflow.
Send Feedback: Displays a dialog box with links for locations where you can obtain additional information. If you really do want to send feedback, then you click the Continue Anyway link at the bottom of the dialog box.
Chapter 4
Performing Essential Data Manipulations Using Python
IN THIS CHAPTER
Using matrixes and vectors to perform calculations
Obtaining the correct combinations
Employing recursive techniques to obtain specific results
Considering ways to speed calculations
You’ve probably used online tutorials or other methods to learn the basics of the Python language — the arcane symbols you use to communicate with your computer. (If not, you can find good basic tutorials at https://www.w3schools.com/python/
and https://www.tutorialspoint.com/python/index.htm
). However, simply knowing how to control a language by using its constructs to perform tasks isn’t enough to create a useful application. The goal of mathematical algorithms is to turn one kind of data into another kind of data. Manipulating data means taking raw input and doing something with it to achieve a desired result. (This is a topic covered in Python for Data Science For Dummies, by John Paul Mueller and Luca Massaron [Wiley].) For example, until you do something with traffic data, you can’t see the patterns that emerge that tell you where to spend additional money in improvements. The traffic data in its raw form does nothing to inform you — you must manipulate it to see the pattern in a useful manner.
In times past, people performed the various manipulations to make data useful by hand, which required advanced math knowledge. Fortunately, you can find Python packages to perform most of these manipulations using a little code. You don’t have to memorize arcane manipulations anymore — just know which Python features to use. That’s what this chapter helps you achieve. You discover the means to perform various kinds of data manipulations using easily accessed Python packages designed especially for the purpose. (Chapter 5 takes the next step and shows you how to create your own library of hand-coded algorithms.) This chapter begins with vector and matrix manipulations. Later sections discuss techniques such as recursion that can make the tasks even simpler, plus perform some tasks that are nearly impossible using other means. You also discover how to speed up the calculations so that you spend less time manipulating the data and more time doing something really interesting with it.
You don’t have to type the source code for this chapter manually. In fact, using the downloadable source is a lot easier. You can find the source for this chapter in the \A4D2E\A4D2E; 04; Basic Vectors and Matrixes.ipynb, \A4D2E\A4D2E; 04; Binary Search.ipynb, and \A4D2E\A4D2E; 04; Recursion.ipynb files of the downloadable source. See the Introduction for details on how to find these source files.
Performing Calculations Using Vectors and Matrixes
To perform useful work with Python, you often need to work with larger amounts of data that come in specific forms. These forms have odd-sounding names, but the names are quite important. The three terms you need to know for this chapter are as follows:
Scalar: A single base data item. For example, the number 2 shown by itself is a scalar.
Vector: A one-dimensional array (essentially a list) of data items. For example, an array containing the numbers 2, 3, 4, and 5 would be a vector.
Matrix: A two-or-more-dimensional array (essentially a table) of data items. For example, an array containing the numbers 2, 3, 4, and 5 in the first row and 6, 7, 8, and 9 in the second row is a matrix.
Python provides an interesting assortment of features on its own, but you'd still need to do a lot of work to perform some tasks. To reduce the amount of work you do, you can rely on code written by other people and found in packages. The following sections describe how to use the NumPy package (https://numpy.org/
) to perform various tasks on scalars, vectors, and matrixes. This chapter provides an overview of NumPy by emphasizing the features you use later (see https://www.w3schools.com/python/numpy/default.asp
for more details).
Understanding scalar and vector operations
The NumPy package provides essential functionality for scientific computing in Python. To use numpy
, you import it using a command such as import numpy as np
. Now you can access numpy
using the common two-letter abbreviation np
.
Python provides access to just one data type in any particular category. For example, if you need to create a variable that represents a number without a decimal portion, you use the integer data type. Using a generic designation like this is useful because it simplifies code and gives the developer a lot less to worry about. However, in scientific calculations, you often need better control over how data appears in memory, which means having more data types, something that
numpy
provides for you. For example, you might need to define a particular scalar as a short
(a value that is 16 bits long). Using numpy
, you could define it as myShort = np.short(15)
. The NumPy package provides access to an assortment of data types (https://numpy.org/doc/stable/reference/arrays.scalars.html
).
Use the numpy array()
function to create a vector. For example, myVect = np.array([1, 2, 3, 4])
creates a vector with four elements. In this case, the vector contains standard Python integers. You can also use the arrange()
function to produce vectors, such as myVect = np.arange(1, 10, 2)
, which fills myVect
with [1, 3, 5, 7, 9]. The first input tells the starting point, the second the stopping point, and the third the step between each number. A fourth argument lets you define the data type for the vector.
You can also create a vector with a specific data type. All you need to do is specify the data type like this: myVect = np.int16([1, 2, 3, 4])
to fill myVect
with a vector containing 16-bit integer values. To verify this for yourself, you can use print(type(myVect[0]))
, which outputs <class 'numpy.int16'>
.
You can perform basic math functions