1.7 Results and Discussions
Ontology-based Semantic Indexing (OnSI) model builds domain Ontology for each product/service review documents using the selected features from the CFSLDA model. Protégé software is used to build and query the Ontology model. The top five terms selected by LDA model, related to each topic for the dataset is shown in Table 1.4. Each topic is manually labeled in context with the first term in the list. It is difficult to carry out human annotation for all the terms grouped under the topic.
Table 1.4 List of top five terms by LDA model.
Topics/features of (DS1) | Feature terms by FSLDA model (DS1) |
Topic 1—Cost | cost, test, money, charge, day |
Topic 2—Medicare | doctor, nurse, team, treatment, bill |
Topic 3—Staff | staff, patient, child, problem, face |
Topic 4—Infrastructure | hospital, people, room, experience, surgery |
Topic 5—Time | time, operation, hour, service, check |
For example, the term “bill” is one of the top words under the topic “medicare.” However, it is more related to the topics “cost” or “infrastructure” or “time.” Similarly, the term “appointment” is not present in any of the list under top 5 or top 10 terms; however, it is more appropriate to the topics “time” and “medicare.” In order to alleviate this problem, the CFSLDA model selects the representative terms of each topic with reference to the first term (the term which has the highest term-topic probability in each topic) in the list, using the correlation analysis. As stated in the previous example, the term “bill” is not related to the term “doctor,” and it is highly correlated with the terms “cost,” “hospital,” and “time.” The correlation values of these terms are shown in Table 1.5. For example, the term-topic probability “Φtw” of “room” is 0.0134 and correlated value “c” with “cost” is 0.0222. As stated in another example, the term “appointment” is highly correlated with the terms “doctor” and “time,” and it is grouped under the topic “medicare” and “time” as shown in Table 1.5. As an another example, the term “disease” is related with “doctor” and “hospital,” and it is not related with the terms “cost,” “time,” and “staff,” as shown in Table 1.5.
Table 1.5 Sample correlated terms selected by CFSLDA.
Features | Cost | Medicare | Staff | Infrastructure | Time | |
High probable terms | cost | doctor | Staff | Hospital | time | |
Term-topic probability (Φtw) | 0.0923 | 0.2132 | 0.2488 | 0.3152 | 0.1247 | |
Correlated value (c ) | 1 | 1 | 1 | 1 | 1 | |
Sample terms modeled by CFSLDA | ||||||
Room | Φtw C | 0.01340.0222 | 0.00040.1378 | 0.00050.2392 | 0.04710.0402 | 0.01340.0222 |
Disease | Φtw C | 0.0004-0.0347 | 0.02220.0547 | 0.0005-0.0462 | 0.00040.0948 | 0.0004+0.0408 |
Appointment | Φtw C | 0.0134-0.0343 | 0.00920.1802 | 0.0005-0.0462 | 0.0004-0.0414 | 0.00040.0477 |
Patient | Φtw C | 0.01770.0042 | 0.00040.1415 | 0.12470.1429 | 0.00040.1502 | 0.00050.2238 |
Bill | Φtw C | 0.00040.0468 | 0.0265-0.0015 | 0.00010.1614 | 0.01210.1176 | 0.00050.2111 |
Table 1.6 shows the list of feature terms selected by the CFSLDA model. Among the pre-processed and PoS tagged nouns, 68 terms are selected for the topic “cost,” 110 for “medicare,” 112 for “staff,” 101 for “infrastructure,” and 73 for “time.”
Table 1.6 List of correlated feature terms selected by CFSLDA model.
Features of DS1 | Number of terms selected by CFSLDA | Correlated feature terms by CFSLDA model (DS1) |
Cost | 68 | cost, test, money, charge, day, case, department, patient, room, pay, bill, ... |
Medicare | 110 |
doctor, discharge, medicine, treatment, appointment, admission, disease, option, pain, reply,
|