This Delphi procedure resulted in professional consensus on the 10 most important predictors of persistent shoulder pain 3 months after initial consultation in primary care. Expert selected predictors appeared to be different from that of a statistically derived model, however both models confirmed the importance of symptom duration, baseline level of disability and multisite pain. Panel members additionally selected age, baseline pain intensity and psychological factors as important predictors. Concerning predictive performance, we found the statistically derived model to be slightly better than the expert-based prognostic model.
Since clinical expertise is expected to complement statistically derived prognostic models, this study aimed to reach clinical consensus on which predictors are most important for predicting persistent shoulder pain. It was shown that health care professionals' consensus based selection of key predictors reflected most statistically selected predictors but also included additional predictors which were not identified by statistical selection. During the inventory of potential predictors (i.e., the first Delphi round) health care professionals even identified predictors which previously have not been directly associated with shoulder pain together with predictors which have been shown to be associated with poor outcome in other musculoskeletal pain conditions (e.g., earlier experiences with shoulder treatment, smoking, diabetes mellitus, alcohol intake, ethnicity, level of training discipline, perceived versus actual work activity, social support, distress). None of these predictors made it to the final selection of most important predictors.
The consensus based selection of the key predictors of persistent shoulder pain, was derived using a Delphi procedure. Although this technique is often applied in consensus based research[5, 30], its validity and reliability are sometimes object of discussion. Since consensus findings may vary depending on the panel, the guidelines for consensus methods by Fink et al. were followed where possible. With a minimum participation rate of 31 panel members in a single Delphi round, our expert panel was sufficiently sized for obtaining reliable results. As multi-disciplinary panels may select a wider range of predictors compared to single-disciplinary panels[33, 34], our panel consisted of health care professionals and researchers from different disciplines and geographical areas in the United Kingdom and the Netherlands. Furthermore, the Delphi procedure was completely anonymous. Panel members never met, neither did they knew each others identities. Therefore, negative group interactions or dominant opinions were eliminated. To assist our panel members in selecting prognostic factors we provided them with a resource, i.e., a list of potential predictors based on a previous systematic review. Although not an uncommon practice in consensus based research[8, 9], one might argue that providing such a list might hinder the unveiling of new potential predictors. Therefore, during the entire Delphi process all panel members were encouraged to suggest additional potential predictors. Since a part of our panel was also involved in shoulder related clinical research, they were considered to be informed on the latest developments in the literature. This together with the option of providing additional information lead us to believe all predictors for persistent shoulder pain in primary care patients were identified by our panel.
How can we explain observed differences in expert and statistical selected prognostic factors? Taking into account the above mentioned considerations, it is unlikely that these differences were caused by methodological limitations in the Delphi procedure. Because our panel of health care professionals was trained in the clinical management of individual patients, they might have had problems with providing prognostic factors for the general population of shoulder pain patients. This could have complicated the identification of universal prognostic factors for shoulder pain patients. Another explanation for the observed differences in selected predictors might be found in the methodological limitations of predictor selection in statistically derived models. In the applied methodology, predictors were selected by an automated selection procedure. As shown by Austin and Tu, statistical predictor selection can give biased results. Automated backward elimination or forward selection might result in omission of important predictors or the random selection of less important predictors. As a result statistically derived models may be unstable, which was previously demonstrated for our statistically derived model. Differences between expert-based and statistical selection of predictors might therefore be largely influenced by the chosen method of statistical predictor selection. However, how to optimally perform variable selection is still a subject of discussion.
One of the strengths of the current study was that next to establishing consensus on key predictors, the predictive performance of these predictors was empirically tested. Results showed that both expert-based models did not perform as optimally compared to the statistically derived prognostic model. This is a notable result since clinical knowledge is expected to complement statistical modelling and the derivation of our statistical model has some known limitations in predictor selection. These findings do however need to be interpreted with caution since they do not suggest that statistical based scoring systems are superior to clinical prognosis. Although we asked our panel for suggestions on how to formulate and score each predictor, a weakness of this study was that we had to use an existing dataset which did not include the exact same variables as proposed by the expert panel. Another weakness was that a potential floor-effect associated with low baseline pain ratings could have occurred in our measure of outcome. Although approximately 19% of the subjects in our database had a baseline pain score of ≤2, all baseline pain categories (e.g., 1 to 10) showed a constant percentage of subjects identified with persistent shoulder pain of approximately 40 to 60%. Thus, apart from subjects with a baseline pain rating of 0 we reasoned that our analyses were not affected by a potential floor effect. Furthermore, although we derived an optimal model using continuous scales, the expert-based model had to compete with a statistical model that was derived in the same dataset and therefore was expected to show better predictive performance. Hence the conclusion of the superiority of statistical prognosis over clinical prognosis might be impetuous. Another aspect that can be regarded as a weakness of our study is the dichotomization of key predictors in one of the expert-based prognostic models. Dichotomization of predictors is in the literature often criticized because it may lead to loss of information and thus a decrease in predictive performance. Although we expected our panel members to be familiar with this undesirable effect, most of them said they preferred a prognostic model which consists of simple (i.e., dichotomous) predictors. This illustrates at this point the discrepancy between prognostic research and clinical practice. In prognostic research model performance is most important, in clinical practice models in addition need to be easy to use. Unfortunately simplicity of the model goes at cost of the predictive performance, as can be seen by the effect of dichotomisation of predictors by using median values as cut-off[21, 22].
With these considerations it remains unclear whether estimations of prognosis by health care professionals are superior or not to the estimation of prognosis obtained by scoring systems. Previous studies have shown that both clinical prognosis and scoring systems can be superior to one another[36–38]. It might even be conceivable that prognostic superiority is case dependent (type of musculoskeletal condition, health care profession). Therefore, clinical prognosis and scoring systems for the prognosis of non-recovery from shoulder pain will be compared in a future study.