Thus, it demands the need for designing a model that can predict their health status from daily life activities.
2.2 Background
It is important for everyone to understand their health status, it helps to avoid future diseases. As mentioned previously some of the parameters of the health status are sleep status, smoke status, drink status, disease status, etc. Directly or indirectly they depend on the individual’s daily life activities and physical measures. In healthcare data management, a huge amount of structured or unstructured data related to the patient is generated from the diagnostic reports, doctor’s prescription, and the wearable devices. In recent years the healthcare data analysis and estimating the future health status are the major focused domains in healthcare. Disease Prediction has a major impact on healthcare analytics as it predicts outbreaks of epidemics to avoidable diseases and improves the quality of life. Some of the recent works proposed a verity of models to predict health status a person with the help of various factors. Researchers Sahoo, Mohapatra, and Wu [10], proposed a cloud-based probabilistic data acquisition method and also, designed an approach to predict the impending health state of a person based on the current health status. A work by Hirshkowitz et al. [5], proposed a method to evaluate and recommended sleep duration for individuals based on their age categories. Researchers [9], proposed a new approach for the disease risk prediction, in that they also proposed the Convolutional Neural Network (CNN) based on unimodal disease risk prediction and CNN-based Multimodal Disease Risk Prediction. Reseachers Weng, Huang, and Han [2], discussed different types of artificial neural network (ANN) techniques for disease prediction and evaluated all the methods based on statistical tests. Researchers [7], proposed a system to collect health data through some questionnaires and analyzed using deep learning architectures.
A work by Tayeb et al. [12], proposed a method based on the popular machine learning algorithm KNN to predict heart disease and chronic kidney failure. Researchers [6] proposed an automated system for the prediction of stroke based on Electronic Medical Claims (EMCs), and they compared the Deep Neural Network (DNN) with the gradient boosting decision tree (GBDT), logistic regression (LR) and support vector machine (SVM) approach. Researchers [8] proposed the cloud-based smart clothing system for sustainable monitoring of human health. They also discussed the technologies and the implementation of methodologies. Reseachers Schmidt, Tittlbach, Bös & Woll [11], analyzed varieties of physical activity, fitness and health, they considered 18 years duration for study and identified interesting insights. In a recent work on Analyzing University Fitness Center data [14], the user’s fitness activity data is collected to predict the crowd at the fitness center. But the fitness activity data can be used to predict more than that.
A lot of research was done on measuring health parameters numerically. Also, there are many works on calculating some health parameters from other parameters. A work by Harris-Benedict [4] calculates Basal Metabolic Rate from an individual’s physical measures. It is used to estimate the number of calories needed for an individual to maintain good health. Our work incorporated the effect of daily life activities on health status. But that data can be used to personalize health predictions and suggestions. This motivated to design a model that predicts health status from the daily life activities of individuals.
2.3 Problem Statement
Let At be the set of daily life activities done by an individual t day’s back. Thus, A0 is the set of activities done by an individual today, A1 be the set of activities done by an individual yesterday, and so on. A is the collection of the activities of an individual for many days. M be the set of physical measures of an individual. H be the health status matrix.
Definition 1: Health Status Matrix: A health status matrix M describes the outcome of various parameters of health status. Each row of the matrix is considered as a vector of possible outcomes of the respective parameter of the health status. Examples of health status parameters are sleep status, smoke status, drink status, etc.
Given a set of daily life activities and physical measures of users over a few days and their health status. The health status of a set of users already defined, known as labeled users UL. Whereas the health status of other sets of users is not defined, known as unlabeled users UV. The aim of the proposed model is to learn a function that uses the information of the labeled users’ UL and find the health status of the unlabeled users UV.
Given a series of activities from last t days, the objective is to learn a function F,
where M is the set of physical measures of a user At is the set of activities of the user t days back. H is a health status matrix
2.4 Proposed Architecture
Figure 2.1 describes the architecture of the proposed model. The set of daily life activities and physical measures of an individual is taken from the users and fed into a pre-processor phase, which processes the input by reducing the number of features and does the required data pre-processing operations.
2.4.1 Pre-Processing
The daily life activities of an individual that are mainly considered are screen time, sleep time, physical activity, number of cigarettes smoked, units of alcohol consumed. The measures that are mainly considered are age, gender, height, weight, calorie intake. Thus, there are ten features that are collected from an individual. Then, in the pre-processing step, the number of features is reduced by removing the activities and measures that do not have any direct effect on health status. This is achieved by using the Harris-Benedict Equation [4].
The Harris–Benedict Equation [4] is a method used to estimate an individual’s basal metabolic rate (BMR). It says
For Men | BMR = (10 × Weight in kg) + (6.25 × Height in cm) − (5 × Age in years) + 5 |
For Women | BMR = (10 × Weight in kg) + (6.25 × Height in cm) − (5 × Age in years) − 161 |
As per the Harris–Benedict Equation [4], the calories to be consumed is depending on the BMR value and the physical activity.
Figure 2.1 Architecture of the model.
Calories to be consumed = BMR * Physical Activity
Calories Difference = (Calories Consumed) − (Calories to be consumed).
In the proposed method the number of features is reduced to seven. They are age, gender, sleep time, screen time, number of cigarettes, units of alcohol consumed, and calorie intake.
2.4.2 Phase-I
The Phase-I of the model, process the data received from both the data sources and the user. In this phase, a decision tree classifier is used to estimate the health parameter of the user. Initially, the model is trained with the dataset received from the data sources. The Phase-I of the model