Figures and Tables in the Text Related to the GSS Data Set
Figures and Tables in the Text Related to the Taxi Trips Data Set
Python Modules and Packages
Acknowledgments
Thank you to Leah Fargotstein, Acquisitions Editor—Research Methods, Statistics, and Evaluation at SAGE Publishing, for all your assistance throughout the development of this book.
Thank you to undergraduate and master’s-level business students at Loyola University Chicago for your feedback on earlier drafts of this book.
Thank you to the following reviewers for all your suggestions that helped improve this book.
Jean Mark Gawron, San Diego State University
Charles J. Gomez, City University of New York, Queens College
David Han, The University of Texas at San Antonio
Lenwood S. Heath, Virginia Tech
Gabe Ignatow, University of North Texas
Hakan Islamoglu, Recep Tayyip Erdogan University
Patrick Christian Kaminski, Indiana University Bloomington
Jacqueline Masloff, Bentley University
Neba Nfonsang, University of Denver
James O’Brien, Pennsylvania State University
D. Dwayne Paschall, University of Dallas
Benjamin Soltoff, University of Chicago
Ryan Sougstad, University of Minnesota
Damian Trilling, University of Amsterdam
Giovanni Vincenti, University of Baltimore
Wei Wang, Graduate Center, the City University of New York
Chong Ho Yu, Azusa Pacific University
About the Authors
Frederick Kaeferis Associate Professor of Information Systems at the Loyola University Chicago Quinlan School of Business. After completing a bachelor’s degree in Mathematics and Computer Science, he worked as a mainframe programmer for several years before earning an MBA with concentrations in Finance and Information Systems and a PhD in Management Information Systems. Professor Kaefer has taught computer programming and other information systems courses to business students for over 25 years. In addition to his interest in the Python programming language, Professor Kaefer has taught courses including Data Structures Using C and VBA Programming in MS Office.Paul Kaeferworks as Senior Analytics Engineer at Carrot Health and has instructed two data analytics and visualization bootcamps through Trilogy Education Services. He previously worked for UnitedHealthcare as a data scientist. After earning a bachelor’s degree in Computer Engineering, he earned a master’s degree in Computational Sciences while leading the Data Analysis Team for the GasDay project, a research lab at Marquette University that works with energy utilities around the United States to forecast natural gas demand. In addition to his interest in the Python programming language, Paul has certifications in the SAS programming and R programming languages and is building experience using Tableau.
1 Introduction to Python
Learning Objectives
Explain Python’s background and important features
Describe free, open-source software (FOSS)
Summarize Python’s user community and available resources
Install Python’s platform-independent interpreter
Execute Python code in an Interactive Development Environment (IDE)
Describe the two data sets used throughout the book
Introduction
This chapter gives a brief background of Python and then goes on to illustrate Python programming using an Interactive Development Environment (IDE). Python is an interpreted computer programming language in which you can enter code instructions one at a time or as part of a larger program, which comprises many instructions. Throughout this book, illustrations of entering and executing Python code provide hands-on experience and familiarity with programming in Python. The Python code examples begin in this chapter with writing and running a sample instruction of Python code that prints a simple message to the screen. At the end of the chapter, we introduce the two real-world, large-scale data sets that we will use throughout the book. These data sets embody many different types of data and are well suited for the data analysis and visualization covered in later chapters.
Brief Introduction to Python and Programming
Guido van Rossum first conceived the Python programming language in December 1989 with the idea that it should be easy to read and that it should let users create their own packages of special-purpose coding modules that others could use (Anonymous, 2018). Python’s first release was in 1991, and it combines simple syntax, abundant online resources, and a rich ecosystem of scientifically focused toolkits with a heavy emphasis on community (Perkel, 2015). Syntax is a set of rules that dictate how to specify instructions in a programming language. Packages are libraries of code modules that other programming code can access and use. As of January 2020, there are more than 212,000 projects with packages available for download from the Python Package Index (PyPI), a repository of packages for the Python programming language (Python Software Foundation, 2020). In addition, Python was the most popular introductory language at American universities in 2014, but the teaching of it is generally limited to those studying science, technology, engineering, and mathematics (Anonymous, 2018). The intended audience for this textbook is students and researchers in business and the social sciences. There is prolific use of Python today in both business and the social sciences to develop applications for data analytics. In addition to statistical analysis, we can use Python for web scraping, text mining, machine learning, and developing applications with graphical user interfaces (all of which we cover in this book). Although we can accomplish each of these individually with other programming languages (such as R) and software packages (such as SAS and Tableau), learning Python enables you do all these things and much more.
Python’s Use in Education, Research, and the Corporate World
The development of the Internet has both made large amounts of information available to users, as well as enabled users to create large amounts of information and make it available to the rest of the world. Data are manipulated and processed using computer programming by both business and the social sciences to gain insights that would be too difficult to obtain otherwise. The Python programming language has been the most popular introductory programming language taught at American universities for good reasons. The explosive