Now, we will see its experimental evaluation and result to ensure its efficiency.
3.4.3.1 Evaluation Setting
They have very limited datasets which have multiple-criteria ratings for experiment. For this research, they use two popular real-world datasets: TripAdvisor dataset and Yahoo! Movies dataset. Successively, they used 80% of rated moves or hotels for training purposes and rest 20% for the testing purpose. They evaluated and compared the algorithms which are declared placed to calculate the prediction of rating. They predicted in general ratings mentioned by users on every item for test set, along with calculate efficiency by the very popular mean absolute error method [19].
3.4.3.2 Experimental Result
The outputs are revealed in Figure 3.4. If we take a sincere look, then we will find that the data tags on above of every bar present the rate of development by HCM in correspondence with other methods. In association with algorithms, biased MF represented the outcomes that are generated by the biased MF algorithm. The present results formed on the aggregation-based approach that takes benefits of multiple-criteria ratings. The aggregation is the hybrid model that merges user-specific aggregation models with item-specific aggregation models. In this paper, the proposed models are FCM, PCM, and HCM. In PCM, they choose the most authoritative criteria as contexts using information gain. They tried many selections and combinations here and represented the best selections in this research work [19].
First, biased MF does not require more details like multi-criteria ratings or contexts. So, for this reason, it is the worst model here. As FCM carries outpour efficiency than the Agg method in the TripAdvisor dataset, so applying contexts as criteria preference will not be inadequate choice every time. Choosing the most influential criteria, PCM performs better Agg in those two datasets. Eventually, they observed, HCM is the finest predictive model with the shortest mean absolute error. It has enough to provide remarkable improvements compared with other models and depends on the statistical paired t-test. To be more specific, it is fit to acquire 4.7% and 8.7% improvements in balancing with the aggregation model, 6.7% and 6.9% improvements compared with the FCM, in the TripAdvisor and Yahoo! Movies datasets, respectively. They have proved that HCM performs better than PCM in this experiment [19].
Figure 3.4 Result comparison.
3.4.4 Utility-Based Multi-Criteria Recommender Systems by Zheng
In this research activity, they introduced a utility-based multi-criteria recommendation algorithm. In this algorithm, they studied customer expectations by dissimilar learning to rank approaches. Their experimental outputs are depending on practical datasets. It demonstrates the usefulness of these approaches [3].
3.4.4.1 Experimental Dataset
In this research activity, they used two practical datasets where ratings are scaled between 1 and 5. The TripAdvisor data had used. In this dataset, it has more than 22,000 ratings provided by more than 1,500 clients with around 14,000 plus hotels. Every client rated at least 10 ratings. These ratings relate to multi-criteria ratings on seven criteria. Those criteria are cost-effective, convenience, quality of rooms, check-in, and cleanliness of the hotel and general standard of facility and specific business facilities. The Yahoo! Movies dataset was used here. There are more than 62,000 ratings given by more than 2000 clients on around 3,100 movies. Every client rates minimum 10 ratings. These ratings are related with multiple-criteria ratings on furrieries. Those critters are acting, direction, stories, and visual effects. They compared their utility-based models with some approaches. The approaches are MF, linear aggregation model (LAM), hybrid context model (HCM), and criteria chain model (CCM) [3].
They evaluated the efficiency of recommender form on the top 10 recommendations by using accuracy and NDCG to calculate the efficiency. To calculate the utility scores, they used three measures. By applying Pearson correlation, they get little improved results rather than applying cosine similarity. They found that Euclidean distance was the bad choice. They represented the best outcome by using Pearson correlation [3].
3.4.4.2 Experimental Result
As we can see in Figure 3.5, it represents the results the experiment. FMM becomes the best performing baseline method for the TripAdvisor data, but LAM and CCM beat MF by 1%. Here, HCM performs even lower than the MF approach. Through applying the utility-based method, the UBM by applying the listwise ranking can perform well the FMM method. If they use the pointwise and pairwise ranking optimizations, then the other UBM models will fail to beat FMM. From Yahoo! Movies dataset, all methods can perform the MF method that does not consider multi-criteria ratings. To be to detail, the UBM using listwise ranking can upgrade NDGC and precision by 6.3% and 5.4% in the TripAdvisor data, and 4.1% and 8% in FMM in comparison with Yahoo! Movies data [3].
Figure 3.5 Experimental result.
3.4.5 Multi-Criteria Clustering Approach by Wasid and Ali
In this research activity, they suggested a clustering method to use multiple-criteria rating into conventional recommendation system successfully. To generate more on the mark recommendations, they evaluate the intra-cluster client matches by applying Mahalanobis distance approach. Then, they collated their method with the conventional CF [2].
Now, we will take a look on their experimental evaluation and result for its efficiency.
3.4.5.1 Experimental Evaluation
To implement this proposed approach, they have used Yahoo! Movies dataset. This dataset consists of 62,156 rating provides by 6,078 users on 976 movies. To make it simple, they have extracted those clients that have rated to minimum 20 movies. This condition satisfies 484 users and 945 movies, and they have total 19,050 ratings. Then, every client’s rating is splitted arbitrarily into training and testing set. They took 70% of data for training purpose and remaining 30% of data for testing purpose. Then, they calculate the distance between clients successfully. They selected top 30 most equivalent users for the neighborhood set formation. To evaluate this proposed approach, they used the most popular Mean Absolute Error (MAE) performance matrix. MAE is very popular because of its simplicity and accuracy as we have seen before. It matches the goal of the experiment. The mean absolute error estimates the derivation of actual and predicted client ratings [2].
3.4.5.2 Result and Analysis
The dataset which they have used contain both single-criteria and multiple-criteria user provided ratings. Table