A Data Scientist Should Excel in Communication and Visualization Skills
Like it or not, analytics is a technical exercise. At this moment, there is a huge gap between the analytical models and the business users. To bridge this gap, communication and visualization facilities are key! Hence, a data scientist should know how to represent analytical models and their accompanying statistics and reports in user-friendly ways by using, for example, traffic light approaches, OLAP (online analytical processing) facilities, or if-then business rules, among others. A data scientist should be capable of communicating the right amount of information without getting lost in complex (e.g., statistical) details, which will inhibit a model's successful deployment. By doing so, business users will better understand the characteristics and behavior in their (big) data, which will improve their attitude toward and acceptance of the resulting analytical models. Educational institutions must learn to balance between theory and practice, since it is known that many academic degrees mold students who are skewed to either too much analytical or too much practical knowledge.
A Data Scientist Should Have a Solid Business Understanding
While this might seem obvious, we have witnessed (too) many data science projects that failed since the respective data scientist did not understand the business problem at hand. By business we refer to the respective application area. Several examples of such application areas have been introduced in Table 1.5. Each of those fields has its own particularities that are important for a data scientist to know and understand in order to be able to design and implement a customized solution. The more aligned the solution with the environment, the better its performance will be, as evaluated according to each of the dimensions or criteria discussed in Table 1.7.
A Data Scientist Should Be Creative!
A data scientist needs creativity on at least two levels. First, on a technical level, it is important to be creative with regard to feature selection, data transformation and cleaning. These steps of the standard analytics process have to be adapted to each particular application and often the right guess could make a big difference. Second, big data and analytics is a fast-evolving field. New problems, technologies, and corresponding challenges pop up on an ongoing basis. Therefore, it is crucial that a data scientist keeps up with these new evolutions and technologies and has enough creativity to see how they can create new opportunities. Figure 1.2 summarizes the key characteristics and strengths constituting the ideal data scientist profile.
Figure 1.2 Profile of a data scientist.
CONCLUSION
Profit-driven business analytics is about analyzing data for making optimized operational business decisions. In this first chapter, we discussed how adopting a business perspective toward analytics diverges from a purely technical or statistical perspective. Adopting such a business perspective leads to a real need for approaches that allow data scientists to take into account the specificities of the business context. The objective of this book therefore is to provide an in-depth overview of selected sets of such approaches, which may serve a wide and diverse range of business purposes. The book adopts a practitioner's perspective in detailing how to practically apply and implement these approaches, with example datasets, code, and implementations provided on the book's companion website, www.profit-analytics.com.
REVIEW QUESTIONS
Question 1
Which is not a possible evaluation criterion for assessing an analytical model?
a. Interpretability
b. Economical cost
c. Operational efficiency
d. All of the above are possible evaluation criteria.
Question 2
Which statement is false?
a. Clustering is a type of predictive analytics.
b. Forecasting in essence concerns regression in function of time.
c. Association analysis is a type of descriptive analytics.
d. Survival analysis in essence concerns predicting the timing of an event.
Question 3
Which statement is true?
a. Customer lifetime value estimation is an example of classification.
b. Demand estimation is an example of classification.
c. Customer churn prediction concerns regression.
d. Detecting fraudulent credit-card transactions concerns classification.
Question 4
Which is not a characteristic of a good data scientist? A good data scientist:
a. Has a solid business understanding.
b. Is creative.
c. Has thorough knowledge on legal aspects of applying analytics.
d. Excels in communication and visualization of results.
Question 5
Which statement is true?
a. All analytical models are profit-driven when applied in a business setting.
b. Only predictive analytics are profit-driven, whereas descriptive analytics are not.
c. There is a difference between analyzing data for the purpose of explaining or predicting.
d. Descriptive analytics aims to explain what is observed, whereas predictive analytics aims to predict as accurately as possible.
Question 1
Discuss the difference between a statistical perspective and a business perspective toward analytics.
Question 2
Discuss the difference between modeling to explain and to predict.
Question 3
List and discuss the key characteristics of an analytical model.
Question 4
List and discuss the ideal characteristics and skills of a data scientist.
Question 5
Draw the analytics process model and briefly discuss the subsequent steps.
REFERENCES
Agrawal, R., and R. Srikant. 1994, September. “Fast algorithms for mining association rules.” In Proceedings of the 20th international conference on very large data bases, VLDB (Volume 1215, pp. 487–499).
Athanassopoulos, A. 2000. “Customer Satisfaction Cues to Support Market Segmentation and Explain Switching Behavior.” Journal of Business Research 47 (3): 191–207.
Baesens, B. 2014. Analytics in a Big Data World: The Essential Guide to Data Science and Its Applications. Hoboken, NJ: John Wiley and Sons.
Baesens, B., V. Van Vlasselaer, W. Verbeke. 2015. Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection. Hoboken, NJ: John Wiley and Sons.
Bhattacharya, C.