Machine Learning Approach for Cloud Data Analytics in IoT. Группа авторов. Читать онлайн. Newlib. NEWLIB.NET

Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Программы
Год издания: 0
isbn: 9781119785859
Скачать книгу
a model tries to predict sales, then error rate will be difference in predicted sale and actual sale.

Schematic illustration of the general framework of proposed model for predictive data analytics.

      Random forest regression may also be employed for prediction problems as it performs classification and regression. Random forest regression employs some classification criteria to classify data. Thereafter, qualities of this split are measured using mean squared error or mean absolute error. It employs the concept of averaging to improve accuracy of prediction.

      Authors in the chapter propose usage of bootstrap aggregating ML algorithm also referred to as bagging algorithm. Bagging algorithm aims to improve efficiency and accuracy of ML algorithms by reducing the variance. Usage of bagging algorithm advocates achievement of efficient and accurate predictive model. The accuracy of proposed model increases rapidly over time.

      3.4.1 Case Study

      For the sake of illustration of implementation of AI in retail industry, authors in the chapter consider a case study. Similarly, authors have taken a dataset pertaining to a retail store. This dataset comprises of observation for duration of 4 years from 2011 to 2015. This dataset has been taken from kaggle (https://www.kaggle.com/jr2ngb/superstore-datausername:jr2ngb). The considered dataset has 16 variables. Out of these 16 features, 10 are categorical features, 5 are numerical features, and 1 is date feature as follows.

# Feature Name Non-Null Dtype
--- --------------- ----------- -------
0 Order Date 51290 datetime64[ns]
1 Customer_Name 51290 object
2 Segment 51290 object
3 City 51290 object
4 State 51290 object
5 Country 51290 object
6 Category 51290 object
7 Sub-Category 51290 object
8 Product Name 51290 object
9 Sales 51290 float64
10 Quantity 51290 int64
11 Discount 51290 float64
12 Profit 51290 float64
13 year 51290 int64
14 month 51290 int64
15 Day 51290 object

      The number of observations in the considered dataset is 51,290. The considered retail store broadly deals in three types of products, viz., office supplies, technology, and furniture.

Schematic illustration of Pearson’s correlation among various attributes of dataset. Histogram plot for the frequency of customers in country level (India).