Each chapter contains the following features:
Introduction
Key terms bolded when first defined
Tables
Figures with code examples to point out usage
Stop, Code, and Understand! exercises after topics are covered
Chapter summary
Glossary with key terms from the chapter
End-of-chapter exercises
We also provide an appendix at the back of the book with solutions to all Stop, Code, and Understand! exercises and resources for further reference.
Approach
This book is uniquely suited to people in the fields of business and social science who are learning programming for data analysis applications. Business and social science students learning programming need to have meaningful examples that are relevant to their field, in which they can see the value in the software applications developed. Several essential components are (1) the extraction of data from a database and/or the web, (2) the statistical analysis and visualization of the data to support decision making, and (3) the development of a graphical user interface that both makes applications more inviting for users and limits possible errors. Through our careful presentation and explanation of these components, students will be more motivated to learn Python and inspired to delve deeper into additional details that we are not presenting in depth.
One of our primary goals is that students using this textbook will develop skills in using a variety of modules and packages. Students will see the tremendous appeal of Python through working with Python modules in packages (including matplotlib, NumPy, Pandas, scikit-learn, SciPy, seaborn, and tkinter), learning the benefits of an interactive development environment (IDLE), and using a package manager (pip). We begin by developing a simple module and using it in Python code in Chapter 2. After covering basic Python features in Chapters 3 and 4, we progress to using Python built-in modules and then modules that are available in installed packages. A table of modules and packages used in the book (and the corresponding textbook figure in which they first appear) immediately follows this preface. Readers of our book can use Python modules created for other purposes after using the variety of modules and packages covered in this book, including both existing Python modules as well as those developed in the future.
The primary market for this book is any social science or business undergraduate-level or graduate-level introductory course in Python programming. This book is for courses that focus on the development of applications using Python, particularly business and social science applications. This textbook assumes no prerequisite knowledge or coursework in computer programming or statistics. The intended course is the first technical course in a data science certificate or MBA-level program. We use data from two very large real-world data sets (the General Social Survey data set and the City of Chicago’s Taxi Trips data set) systematically throughout the book. By the end of a course using this textbook, students will be able to work with large data sets to build statistical models and visualize results. Novice learners following our approach will find it easy to build their technical knowledge and motivation. Our focus on the use of Python modules and packages facilitates students’ learning and prepares students to leverage Python for future purposes.
After taking a course using this textbook, students will be prepared for more advanced courses that require data analysis and use statistics or for research. They will be prepared to conduct analysis on large data sets using Python, learning from our mix of explanation and examples. Finally, students will have a solid foundation to continue building their technical abilities that they developed from this book.
Data Sets
This textbook develops examples and applications using two data sets that are publicly available and represent real-world data science problems. The first data set, the General Social Survey (GSS), is appealing to those with an interest in social sciences. The National Data Program for the Social Sciences has run the GSS since 1972 (http://www.gss.norc.org/About-The-GSS). You can explore the data online using a data explorer or download the complete data sets (http://www.gss.norc.org/Get-The-Data). We downloaded the data sets in SPSS format and used SPSS to import the data and export them to a CSV file. The full data set has over 5,800 variables covering a wide range of survey questions asked, with more than 62,000 responses from over four decades. We will not explore every variable or response but will investigate patterns and trends in the data.
The second data set, Chicago Taxi Trips, is appealing to those interested in business applications. The Taxi Trips data set is publicly available through the City of Chicago. The data set has more than 100 million records of taxi trips, with 26 fields (variables) per record, including duration, fare, tips, and the GPS coordinates of pickups and drop-offs. More information on this data set is available at https://digital.cityofchicago.org/index.php/chicago-taxi-data-released/. We present examples analyzing these data in many ways, including predictions of trip fares based on miles traveled and length in minutes. We also access data directly from the City of Chicago’s application programming interface (API) in Chapter 7.
We use samples from each of the two data sets in examples and exercises throughout this book. Examples include the following: Chapter 1 introduces both data sets and Chapter 2 has beginning code examples that include data like the data found in the taxi trips data set. In Chapter 3, GSS data illustrate tuples and dictionaries to look up a value corresponding to a key value. In Chapter 4, examples from both data sets illustrate control logic examples, including list comprehension. In Chapter 5, a CSV file with GSS data illustrates working with specific columns in a CSV file. In addition, in Chapter 5, a Microsoft Access database based on the taxi trip data illustrates working with data in a relational database file using Structured Query Language (SQL) with the pyodbc package. In Chapter 6, we use taxi trip data to illustrate features from the NumPy and Pandas packages, and a data set from the GSS is used to illustrate data cleaning and preparation using the Pandas package. In Chapter 7, the BeautifulSoup package illustrates how it is not always as easy as one might expect to obtain data by web scraping them from a web page (using the GSS website). In Chapter 7, we also use REST API queries to obtain data from the taxi trips data set directly from the Chicago Data Portal website. In Chapter 8, variables from both the GSS and taxi trips data sets illustrate statistical analysis. In Chapter 9, data in both data sets demonstrate how the matplotlib package visualizes data. In Chapter 10, both the GSS data set and the taxi trip data set illustrate different machine learning classification techniques. In Chapter 11, we develop a graphical user interface using the tkinter package with data from the taxi trips data set. Two tables that more carefully detail the examples presented by data set throughout the textbook immediately follow this preface.
Digital Resources