Daily estimates of clinical severity of symptoms in bipolar disorder from smartphone-based self-assessments

Daily estimates of clinical severity of symptoms in bipolar disorder from smartphone-based self-assessments

Play all audios:

Loading...

ABSTRACT Currently, the golden standard for assessing the severity of depressive and manic symptoms in patients with bipolar disorder (BD) is clinical evaluations using validated rating


scales such as the Hamilton Depression Rating Scale 17-items (HDRS) and the Young Mania Rating Scale (YMRS). Frequent automatic estimation of symptom severity could potentially help support


monitoring of illness activity and allow for early treatment intervention between outpatient visits. The present study aimed (1) to assess the feasibility of producing daily estimates of


clinical rating scores based on smartphone-based self-assessments of symptoms collected from a group of patients with BD; (2) to demonstrate how these estimates can be utilized to compute


individual daily risk of relapse scores. Based on a total of 280 clinical ratings collected from 84 patients with BD along with daily smartphone-based self-assessments, we applied a


hierarchical Bayesian modelling approach capable of providing individual estimates while learning characteristics of the patient population. The proposed method was compared to common


baseline methods. The model concerning depression severity achieved a mean predicted _R_2 of 0.57 (SD = 0.10) and RMSE of 3.85 (SD = 0.47) on the HDRS, while the model concerning mania


severity achieved a mean predicted _R_2 of 0.16 (SD = 0.25) and RMSE of 3.68 (SD = 0.54) on the YMRS. In both cases, smartphone-based self-reported mood was the most important predictor


variable. The present study shows that daily smartphone-based self-assessments can be utilized to automatically estimate clinical ratings of severity of depression and mania in patients with


BD and assist in identifying individuals with high risk of relapse. SIMILAR CONTENT BEING VIEWED BY OTHERS A SMARTPHONE- AND WEARABLE-BASED BIOMARKER FOR THE ESTIMATION OF UNIPOLAR


DEPRESSION SEVERITY Article Open access 01 November 2023 CLASSIFYING AND CLUSTERING MOOD DISORDER PATIENTS USING SMARTPHONE DATA FROM A FEASIBILITY STUDY Article Open access 21 December 2023


SMARTPHONE ACCELEROMETER DATA AS A PROXY FOR CLINICAL DATA IN MODELING OF BIPOLAR DISORDER SYMPTOM TRAJECTORY Article Open access 14 December 2022 INTRODUCTION Bipolar disorder (BD) is a


common and complex illness with an estimated prevalence of 1–2% and is regarded as one of the most important causes of disability worldwide1,2. BD is characterized by recurrent episodes of


depression, (hypo)mania and mixed episodes intervened by periods of euthymia3 and with a high degree of comorbidity and functional impairment4. BD is associated with an elevated risk of


mortality due to suicide and medical comorbidities such as cardiovascular disease and diabetes5,6,7, and among people with BD, life expectancy is decreased 8–12 years8,9. In clinical


practice, there are major challenges in diagnosing and treating BD10. Patients with BD are often misdiagnosed, and the correct diagnosis can be delayed for several years after illness


onset11,12,13. Currently, due to the lack of objective tests, the diagnostic process and the clinical assessment of the severity of depressive and manic symptoms relies on subjective


information, clinical evaluation and rating scales14. Periodic clinical evaluations using clinical rating scales such as the Hamilton Depression Rating Scale (HDRS)15 and the Young Mania


Rating Scale (YMRS)16 are currently used as the golden standard for assessing the severity of depressive and manic symptoms in patients with BD. Each rating scale consists of a series of


items reflecting various symptoms of depression and mania, and these items are finally added up to produce a total score summarizing the current severity of depressive (HDRS) or manic (YMRS)


state of the patient. However, the use of clinical rating scales involves a risk of potential patient recall bias, other recall distortions, decreased illness insight (mainly during


affective episodes) and individual clinician observer bias17,18,19,20,21. In addition, the clinical evaluations are time consuming and require a specialist who is trained and experienced in


using the rating scales to produce consistent, valid and reliable results. As part of treatment, patients may be asked to perform daily self-assessments to track changes in symptoms between


clinical evaluations. Modern smartphones provide a unique platform for fine-grained real-time symptom monitoring and management, and a convenient means of self-assessment that have


traditionally been carried out on paper22,23,24. A smartphone-based monitoring system enables users to ubiquitously record and review their own data, receive reminders, and even share data


with carers and clinicians. From the perspective of health care providers, it offers efficient, online monitoring of a group of patients and enables intervention in case any deterioration is


observed. Electronic self-monitoring has the additional benefit of making data available for immediate and automatic analysis that can help support monitoring and treatment tasks between


outpatient visits. Correlations between smartphone-based self-reported mood scores and clinical ratings of depressive and manic symptoms measured using the HDRS and the YMRS in patients with


BD have already been demonstrated by previous work25,26,27, but to our knowledge this is the first study to predict scores of clinical ratings directly from combinations of smartphone-based


self-assessed data in patients with BD. In related work, detection of daily self-reported mood from smartphone sensor and usage data is well studied23,28,29,30, but remains a difficult


problem due to noisy data. In ref. 31, Grünerbl et al. classified affective states and state changes derived from clinical ratings and phone interviews of patients with BD from a combination


of smartphone sensor modalities and argued that detecting deviations from the euthymic state is more important than the recognition of a particular affective state in practical


applications. Several studies in the field of affective computing have highlighted the need for personalized models to account for individual differences in order to achieve good predictive


performance29,30,32,33. However, a separate analysis is not feasible until sufficient data about each individual is available. Hierarchical Bayesian modelling is a well-suited approach for


providing individual models while borrowing statistical power from the population, which is especially useful when the individual datasets are too small to be analysed separately34. The main


of objective of this study was to examine the feasibility of producing daily estimates of clinical ratings of depression and mania based on smartphone self-assessments of symptoms collected


from a group of patients with BD, who were followed as part of a randomized controlled trial (RCT)35. Additionally, we aimed to demonstrate how uncertainty in the estimated quantities could


be used to compute individual, daily risk of relapse, useful for identifying high-risk individuals who need urgent assistance. Our assumption was that daily, automatic estimates of clinical


ratings augmented with individual relapse risk scores are more interpretable and actionable results than observing the smartphone-based self-assessments directly and can be a valuable tool


in continuous monitoring of illness activity and treatment of patients with BD. MATERIALS AND METHODS PATIENTS AND STUDY DESIGN Data analysed in this study was collected between September


2014 and January 2018 during the MONARCA II RCT, investigating the effect of smartphone-based monitoring in patients with BD35. All patients with a diagnosis of BD who had previously been


treated at the Copenhagen Clinic for Affective Disorder, Denmark, in the period from 2004 to January 2016 and who at the time of recruitment were being treated at community psychiatric


centres, private psychiatrists and general practitioners were invited to participate in the trial. The clinic is a specialized outpatient clinic with a catchment area consisting of the


Capital Region in Denmark corresponding to 1.4 million people. Patients with a newly diagnosis of BD or with treatment-resistant BD were referred to the clinic. The staff consists of


specialists in psychiatry, psychologists, nurses, and a social worker, all with specific experience and knowledge regarding BD. Treatment at the clinic comprises a two-year program including


combined evidence-based psychopharmacological treatment and supporting therapy, including group psychoeducation36. Patients were included in the study for a nine-month follow-up period if


they had a BD diagnosis according to ICD-10 using the Schedules for Clinical Assessments in Neuropsychiatry (SCAN)37 and previously were treated at the Copenhagen Clinic for Affective


Disorder. Patients with schizophrenia, schizotypal or delusional disorders, previous use of the MONARCA system, pregnancy and lack of Danish language skills were excluded. Patients with


other comorbid psychiatric disorders and substance use were eligible for the trial. As part of the MONARCA II trial, patients were randomized to either using a smartphone-based monitoring


system (the Monsenso system) for daily self-monitoring (the intervention group) or to treatment as usual (the control group). Patients from the intervention group who successfully provided


smartphone-based self-monitoring data were included in the analyses in the present study. DATA DESCRIPTION CLINICAL ASSESSMENTS The dataset consists of 280 clinical ratings collected from 84


patients with BD. Each clinical rating includes ratings for severity of depression and mania using the HDRS15 and the YMRS16, respectively. Each participant was evaluated by a clinician up


to 5 times during the study period (at baseline, after 4 weeks, 3 months, 6 months and 9 months). All clinical assessments were conducted by a researcher (MFJ), who was blinded to all


smartphone-based data. Thus, data on the severity of depressive and manic symptoms were collected rater-blinded. On both rating scales, the first item indicates mood and low severity ratings


indicate low levels of either depressive or manic symptoms while high severity ratings indicate severe symptoms. A score of 13 or more on either rating scale was classified as a depressive


or manic episode, respectively, while a high score on both scales at the same time constituted a mixed episode. The cut-off on the HDRS and the YMRS of 13, in contrast to a lower cut-off,


was chosen á priori to increase the validity of a current affective depressive or manic/mixed state (the more severe, the higher the validity). A euthymic state was defined as HDRS and YMRS


less than 13 thereby also including affective states with partial remission. Clinical ratings with the HDRS and the YMRS were considered to be valid on the day of the assessment as well as


the 3 previous days, thus each rating is attributed a total of 4 days in the present dataset. SMARTPHONE-BASED SELF-ASSESSMENTS In addition to periodic clinical ratings, patients were


instructed to carry out daily self-assessments via a smartphone application (the Monsenso system) configured for the present study. The smartphone application was developed using an


iterative, user-centred design process involving patients, IT researchers, clinicians and clinical researchers, and the items chosen for the self-assessments were designed to capture


clinically important symptoms of bipolar disorder23. The self-assessment included the following items: activity level (scored from −3 to +3); alcohol consumption (number of units from 0 to


10+); anxiety level (scored from 0 to 2); irritability level (scored from 0 to 2); cognitive problems (scored from 0 to 2); medicine adherence (not taken/taken/taken with changes); mixed


mood (yes/no); mood (scored from −3 to +3 including −0.5 and +0.5); sleep duration (in hours); and stress level (scored from 0 to 2). The activity, medicine, mood and sleep items were


mandatory items, which the patients evaluated daily. Additionally, the smartphone application enabled users to configure reminders and users were allowed to provide self-assessments


retrospectively for up to 2 days in case they forgot the daily entry. The entered self-assessed data collected over time was visually presented to the users on their smartphone. STATISTICAL


ANALYSIS DATA PREPROCESSING Three smartphone-based self-assessment variables, _mood_, _sleep_ and _medicine_, required preprocessing prior to analysis. We split the mood variable into a


negative and positive component, _mood negative_ and _mood positive_, allowing for non-linear relationships with the clinical ratings as we expected negative mood to be associated mainly


with severity of depression (reflected by scores on the HDRS) and positive mood to be associated mainly with severity of mania (reflected by scores on the YMRS). Additionally, we expected


the relationship between sleep duration and symptom severity to be non-linear as increased or decreased sleep duration can both represent signs of deterioration during depression and mania.


To encode this, we subtracted the individual-level mean of the sleep duration variable and split the result into positive and negative components, _sleep negative_ and _sleep positive_. When


testing the out-of-sample predictive performance of statistical models, the individual mean sleep duration was computed on the training set and applied to generate features in the training


set and test set. The medicine adherence variable was categorical by design with categories: _medicine not taken_, _medicine taken as prescribed_, _medicine taken with changes_. To prepare


the data for analysis, the three possible answers were encoded with two exclusive binary variables indicating if medicine was not taken, _medicine omitted_, or if medicine was taken with


changes, _medicine changed_. The expected most common answer, _medicine taken as prescribed_, was not encoded to avoid collinearity in the regression models (a.k.a. “the dummy variable


trap”). Finally, all variables were normalized by their allowed minimum and maximum values to allow for easier selection of model hyperparameters and interpretation of the inferred model


weights. It was a common problem for patients to occasionally forget to fill in their daily self-assessment, resulting in missing values in the dataset. In most cases, self-assessments were


either complete for all items or missing, but in a few instances, they were only partially answered. To avoid discarding observations with only a few missing values, we experimented with


filling in values from the previous day, which is a common method for dealing with missing values in time series data38. However, it resulted in very few additional complete observations and


we therefore decided to leave this step out. MODELLING APPROACH When analysing several related sets of measurements, such as data from individuals of a population, the two extreme


approaches are to either pool the datasets in a one-size-fits-all solution or to analyse the datasets separately, the latter only being possible when sufficient data is available (also known


as the cold start problem). A hierarchical Bayesian approach provides an intermediate solution that enables personalized models while learning the characteristics of the population39. In a


hierarchical Bayesian regression model, individuals have their own set of regression intercept and weights, _α__j_,_β__j_, sampled from a common population distribution parameterized by


population-level means _μ_ and variances _τ_ determining the amount of pooling: $$\begin{array}{l}\alpha _j,\beta _j\left. \sim \right.{\mathrm{Normal}}\left( {\mu ,\tau } \right)\\


y_{ji}\left. \sim \right.{\mathrm{Normal}}\left( {\alpha _j + \beta _j^T{\boldsymbol{x}}_{ji},\sigma } \right),\end{array}$$ where _y__ji_ is the _i_th observation of the target variable for


individual _j_, _x__ji_ are the corresponding predictor variables and _σ_ is the standard error. This hierarchical tying together of parameters means that data from the population helps


regularize the individual-level weights. An additional benefit of the Bayesian approach is that it expresses uncertainty in all the model parameters and predictions by their posterior


distributions, which is important for interpretability of the model. For further details, a complete description of the hierarchical Bayesian model is provided in the Supplementary


Information (SI). In the present study, we used Stan40 to specify and perform inference in the Bayesian models and then compared the predictive results with pooled and separate naïve mean


baselines and common machine learning methods: Ridge Regression from the scikit-learn machine learning library41 and XGBoost regression from the XGBoost Python package42. Details of the Stan


setup is also included in the SI. To estimate the predictive performance of the models we designed a cross-validation experiment where in each iteration we held out one randomly sampled


clinical evaluation (consisting of up to 4 days of data) from each individual and used the remaining data to fit the models. This procedure was repeated _K_ times and the predicted


coefficient of determination (_R_2) and root mean square error (RMSE) was computed on the held-out data in each iteration. We evaluated the models on the HDRS and the YMRS total scores as


well as item 1 of each rating scale, since these items reflect mood only. Additionally, we evaluated the models using all smartphone-based self-assessment items, the mandatory


self-assessment items (activity, medicine, mood and sleep) and using only the mood self-assessment item, respectively. Estimating scores on the HDRS and the YMRS with separate models enables


prediction of high values of the HDRS and the YMRS at the same time, indicating a mixed episode. COMPUTING RISK OF RELAPSE In some practical applications, it may be more relevant to


accurately identify high-risk individuals than to estimate the exact value of the severity score. Applying a Bayesian approach does not only provide a point estimate of the outcome of


interest but provides a probability distribution of unobserved (future) outcomes given previously observed data, i.e. the posterior predictive distribution, which can be utilized to reason


about uncertainty in the predictions. Specifically, samples from the posterior predictive distribution can be used to compute the probability that an unobserved outcome, \(\tilde y_{ji}\),


exceeds a predefined threshold, _T_: $${\mathrm{Pr}}\left( {\tilde y_{ji} \ge T} \right).$$ When estimating scores of clinical ratings, by applying a threshold _T_ = 13 we can interpret this


probability as the risk that an individual is experiencing severe symptoms and utilize it as a personal score indicating the risk of relapse. ETHICAL CONSIDERATIONS The MONARCA II RCT was


approved by the Regional Ethics Committee in the Capital Region of Denmark (H-2-2014-059) and the Danish Data protection agency (2013-41-1710). The law on handling of personal data was


respected. All potential participants were given both written and oral information about the study before informed consent was obtained. Prior to commencement the trial was registered at


ClinicalTrials.gov (NCT02221336). Electronic data collected from the smartphones were stored at a secure server at Concern IT, Capital Region, Denmark (I-suite number RHP-292 2011-03). The


trial complied with the Helsinki Declaration of 1975, as revised in 2008. RESULTS DESCRIPTIVE STATISTICS The MONARCA II dataset consists of 280 clinical evaluations, with a mean number of


clinical evaluations per patients during the study of 3.33 (SD = 1.14), and a total of 15975 daily smartphone-based self-assessments with a mean number of smartphone-based self-assessments


during the study of 190.18 (SD = 70.97) from 84 patients with BD assigned to the intervention group of the RCT. The age ranged from 21 to 71 years (mean = 43.1, SD = 12.4) and 61.9% (_N_ = 


52) were women. During the study period, most patients presented with rather low severity of depressive and manic symptoms resulting in low HDRS and YMRS scores. The mean HDRS total score


was 7.56 (SD = 6.29) and 20.4% of scores were greater than or equal to 13. The mean YMRS total score was 2.85 (SD = 4.17) and 5.0% of scores were greater than or equal to 13. The mean HDRS


item 1 score was 0.69 (SD = 0.85) and the mean YMRS item 1 score was 0.24 (SD = 0.53). Similarly, the majority of the smartphone-based self-reported mood scores were close to zero with a


mean of −0.14 (SD = 0.48), indicating neutral mood (euthymia). After filling back the clinical severity ratings 4 days (since the clinical rating scales reflect this time period) there were


764 observations with associated smartphone-based self-assessments. Figure 1 shows the association between the clinical ratings and the smartphone-based self-reported mood scores. Overall, a


high score on the HDRS corresponded to neutral or depressed smartphone-based self-assessed mood (_r_ = −0.40, _P_ < 0.01) while a high score on the YMRS corresponded to neutral or


elevated smartphone-based self-assessed mood (_r_ = 0.22, _P_ < 0.001). Only in a few instances were the HDRS and the YMRS rated high at the same time, indicating a mixed episode (_r_ = 


0.13, _P_ = 0.02). MODEL ESTIMATES The hierarchical Bayesian regression model was evaluated on the entire dataset of clinical ratings combined with all self-assessed items of the completed


smartphone-based self-assessments for all participants with at least two data points (_N_ = 433). The model predicting total scores on the HDRS achieved an _R_2 of 0.84, indicating that the


model accounted for 84% of the variance in the data, and a residual RMSE of 2.41. The model predicting total scores on the YMRS achieved an _R_2 of 0.81 and a residual RMSE of 2.07. The


model predicting the HDRS item 1 score achieved an _R_2 of 0.89 and a residual RMSE of 0.30, and the model predicting the YMRS item 1 score achieved an _R_2 of 0.86 and a residual RMSE of


0.22. The distributions of inferred population-level mean, _μ_, and variance, _τ_, parameters in the hierarchical Bayesian regression HDRS total and YMRS total models are summarized in Table


1. The absolute _t_-statistic of the mean parameters, computed as the mean scaled by the standard error of the parameter: \(t_\mu = \bar \mu /{\it{SE}}(\mu )\), is included as a measure of


variable importance, following the intuition that larger absolute weights and lower variance implies importance43. This shows that negative mood was the most important predictor variable in


the HDRS model while positive mood was the most important predictor and in the YMRS model. A visual presentation of the population-level parameters and a weight matrix summarising the


individual parameters are included in the SI. A figure showing the effect size of each self-assessment item is also included in the SI. CROSS-VALIDATION RESULTS The predictive performance of


the hierarchical Bayesian model was evaluated in _K_ = 100 cross-validation experiments on all data where participants had complete observations of clinical ratings and smartphone-based


self-assessments from at least three different clinical evaluations (_N_ = 329). In each iteration, data from one randomly sampled clinical evaluation from each patient was held out and the


remaining data was used to fit the models. Models were fitted to predict HDRS total, YMRS total, HDRS item 1 and YMRS item 1, from (1) all; (2) mandatory and (3) mood self-assessment items,


respectively. The hierarchical Bayesian model was compared to naïve pooled and separate mean models along with pooled and separate ridge regression and XGBoost regression models. Table 2


presents the cross-validation results of predicting HDRS total and YMRS total. Because of low variance in the data, the naïve mean models performed relatively well. Still the hierarchical


Bayesian regression model achieved the best overall performance in every case and was significantly better than the separate mean model in both the HDRS and YMRS case according to


independent _t_-tests (_P_ < 0.001). Overall, the separate models performed better than their pooled counterparts. Table 3 presents the cross-validation results of predicting HDRS item 1


and YMRS item 1, indicating mood. The pooled XGBoost achieved the best result at predicting HDRS item 1 using all self-assessment items. When reducing the feature set to the mandatory or


mood self-assessment items, the hierarchical Bayesian model was best. It was not possible to predict YMRS item 1 significantly better than the naïve mean baselines. PREDICTED RISK OF RELAPSE


SCORES The results from cross-validation experiments predicting the HDRS total score and the YMRS total score using all self-assessment items presented in the previous section were used to


compute risk of relapse scores \({\mathrm{Pr}}\left( {\tilde y_{{\it{ji}}} \ge {\it{T}} = 13} \right)\). The ability of the model to correctly assign high risk to instances with high ratings


can be evaluated as a binary classification problem with severity ratings equal to or greater than the threshold _T_ constituting the positive class. Figure 2 presents receiver operating


characteristic (ROC) curves of the HDRS total and the YMRS total models illustrating the trade-off between true positive rate (TPR) and false positive rate (FPR), comparing the hierarchical


Bayesian regression model to the naïve pooled and separate mean models. The pooled mean model corresponds to a model that either classifies all instances as low risk or high risk, achieving


an area under the curve (AUC) of 0.50 in both the HDRS and YMRS case. The separate mean model independently classifies each individual as either high or low risk based on observed values of


the ratings and achieved an AUC of 0.67 in the HDRS case and AUC of 0.49 in the YMRS case. The hierarchical Bayesian regression model was able to account for information in the


smartphone-based self-assessments as well as individual differences and achieved the highest AUC of 0.89 in the HDRS case and 0.84 in the YMRS case. DISCUSSION In the present study, we


analysed clinical ratings of depression reflected by the HDRS and mania reflected by the YMRS along with daily smartphone-based self-assessments including self-reported mood in a population


of 84 patients with BD. As hypothesized, there was a negative correlation between the HDRS and self-reported mood and a positive correlation between the YMRS and mood. This confirms previous


work25,26,27, and suggests that smartphone-based self-reported mood is a valid indicator of symptom severity in patients with BD and thereby a clinically relevant feature for monitoring and


analysis. Interestingly and as hypothesized, the proposed approach of applying hierarchical Bayesian regression models was able to fit the data distributions of the HDRS total score and the


YMRS total score and all smartphone-based self-assessment items and accounted for more than 80% of the variance in the data according to _R_2. Using the absolute _t_-statistic of the


population-level regression weights as a measure of variable importance, decreased and increased smartphone-based self-reported mood were the most important variables for predicting the


severity of depression (HDRS) and mania (YMRS). This is not surprising since sampling of self-reported mood from the patients was designed to collect indicators on the patient’s affective


state and thus should reflect the clinically rated symptoms. Other important variables in the HDRS total model were decreased sleep and feelings of mixed mood and anxiety, while in the YMRS


total model only mood ranked important (see Table 1). To assess the predictive performance of the hierarchical Bayesian model compared to pooled and separate baseline models, we performed


cross-validation experiments of estimating the HDRS total score, the YMRS total score, the HDRS item 1 score and the YMRS item 1 score using all smartphone-based self-assessment items, the


four mandatory items and mood self-assessment item alone, respectively. Thus, we were able to estimate the total clinical rating scores using regression models based on smartphone-based


self-assessments. The hierarchical Bayesian model achieved the best performance in predicting the HDRS total and was significantly better than a naïve model using the observed individual


(separate) mean as a prediction (_P_ < 0.001). Similarly, the hierarchical Bayesian model was best at predicting the YMRS total score and was significantly better than the naïve separate


mean model. Additionally, we tested models for predicting the first item of the HDRS and the YMRS, indicating mood. The pooled XGBoost model achieved the best result in predicting the HDRS


item 1 score, while estimating the YMRS item 1 score could not be improved over the naïve baseline. In all the presented experiments, we found that models based only on self-assessed mood


were able to retain most of the predictive performance of models based on all self-assessment items. This further shows that mood is the most important self-reported predictor variable for


estimating scores of the HDRS and the YMRS. Overall, the YMRS models did not account for much of the variance in the data, indicated by the low _R_2 scores. This could be mainly due to low


variation in the observed YMRS data. In clinical settings of monitoring illness activity in patients with bipolar disorder, detecting individuals with a high risk of relapse is highly


important in order to enable intervention. Therefore, a sensitive indication if a symptom severity rating is above a critical threshold might be more useful than estimating the exact value


of the severity rating itself. Thus, we demonstrated how uncertainty in the estimated total severity scores can be utilized to compute individual daily risk of relapse scores by considering


samples from the posterior predictive distribution of the hierarchical Bayesian model. In the case of both the HDRS and the YMRS, using hierarchical Bayesian approach achieved substantial


improvements over naïve models using pooled and separate means of observed data as predictions. Hence, including self-assessments in a regression model provided additional useful information


for estimating the level of the clinical severity ratings and hence the relapse risk scores, which is a promising and clinically relevant result. The findings that a combination of


fine-grained daily smartphone-based self-assessment items can be used to estimate and predict clinical ratings are interesting and innovative. Daily longitudinal self-monitoring of mood


symptoms gives valuable information of mood fluctuation experienced by patients with BD between clinical outpatient visits. Long-term monitoring of symptoms has been an essential part of the


monitoring and treatment of BD for decades44 and rapidly evolving smartphone technologies have made it possible to monitor symptoms more continuously, fine-grained and in real-time. This


can be clinically relevant for detection of symptoms before the first or recurrent depressive or manic episodes45, and allow for early intervention on prodromal symptoms. In the latest


version of the Diagnostic and Statistical Manual of Mental Disorders (DSM-V), increased activity level or energy is acknowledged as a core feature of hypomania and mania together with mood


changes46. Several studies using factor analysis have described activation and not mood state as the primary symptom in manic episodes47,48. However, in the present study we found mood to be


the most important predictor variable for estimating the HDRS and the YMRS severity ratings while activity presented with low importance in both models. Furthermore, sleep disturbances and


anxiety has been identified as early symptoms of depression and mania49,50, which is in line with our findings in the HDRS model while sleep and anxiety were less important in the YMRS


model. ADVANTAGES The patients included in the present study were clinically well characterized and were receiving treatment or had received treatment at the Copenhagen Clinic for Affective


Disorders, Denmark. The clinical evaluations were conducted multiple times during follow-up by experienced researchers with a specific knowledge within BD. The smartphone-based


self-assessment system used in the present studies (the Monsenso system) was developed by the authors and has been shown easy to use with a high usability, usefulness, ease of learning to


use and interface quality—also when compared with other smartphone-based self-assessment systems22,51. The use of smartphones for fine-grained real-time monitoring reduced the risk of recall


bias. The proposed hierarchical Bayesian modelling approach is well suited for analysis of small related datasets, especially when the individual datasets are too small to analyse


separately. Additionally, the linear regression method and ability to express uncertainty in all estimated quantities makes the model easy to interpret, which is essential in a clinical


setting. Overall, the findings from the present study are found to be innovative and generalizable to patients with BD not presenting with an acute affective episode and who are willing to


use a monitoring tool during prolonged time periods. LIMITATIONS The dataset used in this study primarily contained clinical ratings of low severity of affective symptoms indicating most


participants did not experience severe symptoms of depression or mania during the study period. Similarly, a large proportion of the self-reported mood scores were close to zero (indicating


euthymia) and had low variance. Consequently, the naïve mean baseline models could fit the data well and achieved good performance in the prediction task. However, the best regression model


was still significantly better than the naïve mean models, showing that it is possible to utilize smartphone-based self-reported data to produce more accurate estimates of the clinical


ratings of symptom severity. Although we saw significant correlations between self-reported mood and the HDRS and the YMRS, respectively, the correlations were weaker than what has been


reported in some other studies45. Furthermore, the absence of high ratings makes it difficult to reason about the performance of the models in detecting extreme cases, which are the most


critical in a monitoring and intervention application. Our analysis does not explore the distribution of missing data and thus assumes data is missing at random. However, it is reasonable to


believe that individuals who are experiencing severe depression or mania have difficulties coping with self-assessment while euthymic individuals find it less relevant. Thus, analysing the


missing data distribution might hold valuable information regarding symptom severity which can be explored further. Lastly, our analysis did not include any temporal information in the


models, but rather used smartphone self-assessment data from a given day to estimate clinical ratings on the same day and treated each day independently from other days. Thus, the analysis


made no assumptions regarding temporal patterns of mood but relied entirely on relationship between data collected on the same day. PERSPECTIVES AND FUTURE IMPLICATIONS Smartphones have


become a ubiquitous technology in modern society and can be utilized to provide improved and personalized illness management and monitoring in psychiatry. Smartphone-based self-assessment


makes data available for immediate analysis and can enable new tools for improved illness monitoring. In particular, accurate, daily estimates of symptom severity could help identify


critical cases and enable timely and individualized intervention. Additionally, advances in sensor technology and algorithms is making it possible to extract a growing range of increasingly


accurate behavioural features directly from sensor data. Utilizing these automatically generated features to infer symptom severity scores could be used to eliminate the need for frequent,


intrusive self-assessments and improve the user experience of illness monitoring systems in psychiatry going forward. In this paper, we have explored the relationship between


smartphone-based self-assessments and clinical ratings observed on the same day with the purpose of identifying current high-risk individuals. A related objective with possible great


clinical potential would be to predict individual risk of relapse ahead of time. We see this as an important topic for future studies. CONCLUSIONS In the present study, clinical ratings of


the severity of depression and mania were estimated from smartphone-based self-assessments collected from patients with BD. We found that our approach of applying a hierarchical Bayesian


model could estimate severity of depression and mania with low error compared to commonly used baseline methods and within 4 points of RMSE on the HDRS and the YMRS rating scales.


Furthermore, we showed how uncertainty in the estimates can be utilized to compute personal relapse risk scores suited for identifying critical cases of patients experiencing severe symptoms


and that our approach achieved substantial improvements over naïve pooled and separate mean models. The results presented in this work show that it is feasible to compute daily estimates of


clinical severity ratings of depression and mania from smartphone-based self-assessments, which can be used to improve and automate continuous disease monitoring and treatment of BD.


REFERENCES * Pini, S. et al. Prevalence and burden of bipolar disorders in European countries. _Eur. Neuropsychopharmacol._ 15, 425–434 (2005). Article  CAS  Google Scholar  * Vos, T. et al.


Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. _Lancet_ 380, 2163–2196


(2012). Article  Google Scholar  * Goodwin, F. K. & Jamison, K. R. _Manic-depressive illness_. (Oxford University Press, New York, 1990). * Sanchez-Moreno, J. et al. Neurocognitive


dysfunctions in euthymic bipolar patients with and without prior history of alcohol use. _J. Clin. Psychiatry_ 70, 1120–1127 (2009). Article  Google Scholar  * Angst, F., Stassen, H. H.,


Clayton, P. J. & Angst, J. Mortality of patients with mood disorders: follow-up over 34-38 years. _J. Affect Disord._ 68, 167–181 (2002). Article  CAS  Google Scholar  * Tondo, L.,


Isacsson, G. & Baldessarini, R. Suicidal behaviour in bipolar disorder: risk and prevention. _CNS Drugs_ 17, 491–511 (2003). Article  CAS  Google Scholar  * Hayes, J. F., Miles, J.,


Walters, K., King, M. & Osborn, D. P. J. A systematic review and meta-analysis of premature mortality in bipolar affective disorder. _Acta Psychiatr. Scand._ 131, 417–425 (2015). Article


  CAS  Google Scholar  * Kessing, L. V., Vradi, E. & Andersen, P. K. Life expectancy in bipolar disorder. _Bipolar Disord._ 17, 543–548 (2015). Article  Google Scholar  * Kessing, L. V.,


Vradi, E., McIntyre, R. S. & Andersen, P. K. Causes of decreased life expectancy over the life span in bipolar disorder. _J. Affect Disord._ 180, 142–147 (2015). Article  Google Scholar


  * Kupfer, D. J., Frank, E. & Ritchey, F. C. Staging bipolar disorder: what data and what models are needed? _Lancet Psychiatry_ 2, 564–570 (2015). Article  Google Scholar  * Kessing,


L. V. Diagnostic stability in bipolar disorder in clinical practise as according to ICD-10. _J. Affect Disord._ 85, 293–299 (2005). Article  Google Scholar  * Agius, M., Murphy, C. L. &


Zaman, R. Under-diagnosis of bipolar affective disorder in A bedford CMHT. _Psychiatr. Danub._ 22(Suppl. 1), S36–S37 (2010). PubMed  Google Scholar  * Knežević, V. & Nedić, A. Influence


of misdiagnosis on the course of bipolar disorder. _Eur. Rev. Med Pharm. Sci._ 17, 1542–1545 (2013). Google Scholar  * Phillips, M. L. & Kupfer, D. J. Bipolar disorder diagnosis:


challenges and future directions. _Lancet_ 381, 1663–1671 (2013). Article  Google Scholar  * Hamilton, M. Development of a rating scale for primary depressive illness. _Br. J. Soc. Clin.


Psychol._ 6, 278–296 (1967). Article  CAS  Google Scholar  * Young, R. C., Biggs, J. T., Ziegler, V. E. & Meyer, D. A. A rating scale for mania: reliability, validity and sensitivity.


_Br. J. Psychiatry_ 133, 429–435 (1978). Article  CAS  Google Scholar  * Peralta, V. & Cuesta, M. J. Lack of insight in mood disorders. _J. Affect Disord._ 49, 55–58 (1998). Article  CAS


  Google Scholar  * Cassidy, F. Insight in bipolar disorder: relationship to episode subtypes and symptom dimensions. _Neuropsychiatr. Dis. Treat._ 6, 627–631 (2010). Article  Google Scholar


  * Látalová, K. Insight in bipolar disorder. _Psychiatr. Q._ 83, 293–310 (2012). Article  Google Scholar  * de Assis da Silva, R. et al. Insight across the different mood states of bipolar


disorder. _Psychiatr. Q_ 86, 395–405 (2015). Article  Google Scholar  * de Assis da Silva, R., Mograbi, D. C., Bifano, J., Santana, C. M. T. & Cheniaux, E. Insight in bipolar mania:


evaluation of its heterogeneity and correlation with clinical symptoms. _J. Affect Disord._ 199, 95–98 (2016). Article  Google Scholar  * Bardram, J. E. et al. Designing Mobile Health


Technology for Bipolar Disorder: A Field Trial of the Monarca System. in _Proc. SIGCHI Conference on Human Factors in Computing Systems. CHI ’13_, 2627–2636 (ACM, New York, 2013). * Frost,


M., Doryab, A., Faurholt-Jepsen, M., Kessing, L. V. & Bardram, J. E. Supporting Disease Insight Through Data Analysis: Refinements of the Monarca Self-assessment System. in _Proc. 2013


ACM International Joint Conference on Pervasive and Ubiquitous Computing. UbiComp ’13_, 133–142 (ACM, New York, 2013). * Bardram, J. E. & Frost, M. The personal health technology design


space. _IEEE Pervasive Comput._ 15, 70–78 (2016). Article  Google Scholar  * Faurholt-Jepsen, M. et al. Behavioral activities collected through smartphones and the association with illness


activity in bipolar disorder. _Int J. Methods Psychiatr. Res._ 25, 309–323 (2016). Article  Google Scholar  * Faurholt-Jepsen, M. et al. Smartphone data as an electronic biomarker of illness


activity in bipolar disorder. _Bipolar Disord._ 17, 715–728 (2015). Article  Google Scholar  * Faurholt-Jepsen, M. et al. Smartphone data as objective measures of bipolar disorder symptoms.


_Psychiatry Res._ 217, 124–127 (2014). Article  Google Scholar  * Ma, Y., Xu, B., Bai, Y., Sun, G. & Zhu, R. Daily Mood Assessment Based on Mobile Phone Sensing. in _Proc. 2012 Ninth


International Conference on Wearable and Implantable Body Sensor Networks_, 142–147 (IEEE, 2012). * LiKamWa, R., Liu, Y., Lane, N. D. & Zhong, L. MoodScope: Building a Mood Sensor from


Smartphone Usage Patterns. in _Proc. 11th Annual International Conference on Mobile Systems, Applications, and Services. MobiSys ’13_, 389–402 (ACM, New York, 2013). * Canzian, L. &


Musolesi, M. Trajectories of Depression: Unobtrusive Monitoring of Depressive States by Means of Smartphone Mobility Traces Analysis. in _Proc. 2015 ACM International Joint Conference on


Pervasive and Ubiquitous Computing. UbiComp ’15_, 1293–1304 (ACM, New York, 2015). * Grünerbl, A. et al. Smartphone-based recognition of states and state changes in bipolar disorder


patients. _IEEE J. Biomed. Health Inf._ 19, 140–148 (2015). Article  Google Scholar  * Abdullah, S. et al. Automatic detection of social rhythms in bipolar disorder. _J. Am. Med. Inform.


Assoc._ 23, 538–543 (2016). Article  Google Scholar  * Taylor, S. A., Jaques, N., Nosakhare, E., Sano, A. & Picard, R. Personalized multitask learning for predicting tomorrow's


mood, stress, and health. _IEEE Transac. Affect. Comput._ 11, 1 (2018). Google Scholar  * Gelman, A. et al. Bayesian Data Analysis, 3rd edn. in Chapman & Hall/CRC Texts in Statistical


Science. (Taylor & Francis, 2013). * Faurholt-Jepsen, M. et al. Daily electronic monitoring of subjective and objective measures of illness activity in bipolar disorder using


smartphones—the MONARCA II trial protocol: a randomized controlled single-blind parallelgroup trial. _BMC Psychiatry_ 14, 309 (2014). Article  Google Scholar  * Kessing, L. V. et al.


Treatment in a specialised out-patient mood disorder clinic v. standard out-patient treatment in the early course of bipolar disorder: randomised clinical trial. _Br. J. Psychiatry_ 202,


212–219 (2013). Article  Google Scholar  * Wing, J. K. et al. SCAN. Schedules for clinical assessment in neuropsychiatry. _Arch. Gen. Psychiatry_ 47, 589–593 (1990). Article  CAS  Google


Scholar  * Hyndman, R. & Athanasopoulos, G. _Forecasting: Principles and Practice_, 2nd edn. (OTexts, Melbourne, 2018). * Murphy, K. P. _Machine Learning: A Probabilistic Perspective_.


(The MIT Press, 2012). * Carpenter, B. et al. Stan: a probabilistic programming language. _J. Stat. Softw., Artic._ 76, 1–32 (2017). Google Scholar  * Pedregosa, F. et al. Scikit-learn:


machine learning in Python. _J. Mach. Learn. Res._ 12, 2825–2830 (2011). Google Scholar  * Chen, T., & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in _Proc. 22nd ACM SIGKDD


International Conference on Knowledge Discovery and Data Mining. KDD ’16_, 785–794 (ACM, New York, 2016). * Molnar, C. Interpretable machine learning. A Guide for Making Black Box Models


Explainable. https://christophm.github.io/interpretable-ml-book/. (2019). * Schärer, L. O., Krienke, U. J., Graf, S. M., Meltzer, K. & Langosch, J. M. Validation of life-charts


documented with the personal life-chart app - a self-monitoring tool for bipolar disorder. _BMC Psychiatry_ 15, 49 (2015). Article  Google Scholar  * Faurholt-Jepsen, M., Munkholm, K.,


Frost, M., Bardram, J. E. & Kessing, L. V. Electronic self-monitoring of mood using IT platforms in adult patients with bipolar disorder: a systematic review of the validity and


evidence. _BMC Psychiatry_ 16, 7 (2016). Article  Google Scholar  * Diagnostic and Statistical Manual of Mental Disorders (DSM–5). American Psychiatric Association.


(http://www.webcitation.org/78BxWU0gk). https://www.psychiatry.org/psychiatrists/practice/dsm. (2019). * Bauer, M. S. et al. Independent assessment of manic and depressive symptoms by


selfrating. Scale characteristics and implications for the study of mania. _Arch. Gen. Psychiatry_ 48, 807–812 (1991). Article  CAS  Google Scholar  * Scott, J. et al. Activation in bipolar


disorders: a systematic review. _JAMA Psychiatry_ 74, 189–196 (2017). Article  Google Scholar  * Jackson, A., Cavanagh, J. & Scott, J. A systematic review of manic and depressive


prodromes. _J. Affect Disord._ 74, 209–217 (2003). Article  Google Scholar  * Pavlova, B., Perlis, R. H., Alda, M. & Uher, R. Lifetime prevalence of anxiety disorders in people with


bipolar disorder: a systematic review and metaanalysis. _Lancet Psychiatry_ 2, 710–717 (2015). Article  Google Scholar  * Faurholt-Jepsen, M. et al. Smartphone-based self-monitoring in


bipolar disorder: evaluation of usability and feasibility of two systems. _Int J. Bipolar Disord._ 7, 1 (2019). Article  Google Scholar  Download references ACKNOWLEDGEMENTS We would like to


thank the participants of the MONARCA II RCT as well as the clinical staff at the Psychiatric Center Copenhagen who helped facilitate the trial and assemble the dataset. The study was


funded by the Innovation Fund Denmark through the RADMIS project and the Copenhagen Center for Health Technology (CACHET). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of Applied


Mathematics and Computer Science, Technical University of Denmark, Lyngby, Denmark Jonas Busk & Ole Winther * Department of Health Technology, Technical University of Denmark, Lyngby,


Denmark Jonas Busk & Jakob E. Bardram * Copenhagen Affective Disorder Research Center (CADIC), Psychiatric Center Copenhagen, Rigshospitalet, Copenhagen, Denmark Maria Faurholt-Jepsen 


& Lars Vedel Kessing * Monsenso ApS, Copenhagen, Denmark Mads Frost * Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark Lars Vedel Kessing * Center


for Genomic Medicine, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark Ole Winther * Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen,


Denmark Ole Winther Authors * Jonas Busk View author publications You can also search for this author inPubMed Google Scholar * Maria Faurholt-Jepsen View author publications You can also


search for this author inPubMed Google Scholar * Mads Frost View author publications You can also search for this author inPubMed Google Scholar * Jakob E. Bardram View author publications


You can also search for this author inPubMed Google Scholar * Lars Vedel Kessing View author publications You can also search for this author inPubMed Google Scholar * Ole Winther View


author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to Jonas Busk. ETHICS DECLARATIONS CONFLICT OF INTEREST J.B., M.F.J., and


O.W. have no conflicts of interest. M.F. and J.E.B. are founders and shareholders of Monsenso. L.V.K. has during recent three years been a consultant for Lundbeck. ADDITIONAL INFORMATION


PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTAL INFORMATION SUPPLEMENTAL INFORMATION


RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and


reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes


were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If


material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain


permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS


ARTICLE Busk, J., Faurholt-Jepsen, M., Frost, M. _et al._ Daily estimates of clinical severity of symptoms in bipolar disorder from smartphone-based self-assessments. _Transl Psychiatry_ 10,


194 (2020). https://doi.org/10.1038/s41398-020-00867-6 Download citation * Received: 04 November 2019 * Revised: 18 April 2020 * Accepted: 29 April 2020 * Published: 18 June 2020 * DOI:


https://doi.org/10.1038/s41398-020-00867-6 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not


currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative