Stratified primary care versus non-stratified care for musculoskeletal pain: findings from the STarT MSK feasibility and pilot cluster randomized controlled trial

Background Musculoskeletal (MSK) pain from the five most common presentations to primary care (back, neck, shoulder, knee or multi-site pain), where the majority of patients are managed, is a costly global health challenge. At present, first-line decision-making is based on clinical reasoning and stratified models of care have only been tested in patients with low back pain. We therefore, examined the feasibility of; a) a future definitive cluster randomised controlled trial (RCT), and b) General Practitioners (GPs) providing stratified care at the point-of-consultation for these five most common MSK pain presentations. Methods The design was a pragmatic pilot, two parallel-arm (stratified versus non-stratified care), cluster RCT and the setting was 8 UK GP practices (4 intervention, 4 control) with randomisation (stratified by practice size) and blinding of trial statistician and outcome data-collectors. Participants were adult consulters with MSK pain without indicators of serious pathologies, urgent medical needs, or vulnerabilities. Potential participant records were tagged and individuals sent postal invitations using a GP point-of-consultation electronic medical record (EMR) template. The intervention was supported by the EMR template housing the Keele STarT MSK Tool (to stratify into low, medium and high-risk prognostic subgroups of persistent pain and disability) and recommended matched treatment options. Feasibility outcomes included exploration of recruitment and follow-up rates, selection bias, and GP intervention fidelity. To capture recommended outcomes including pain and function, participants completed an initial questionnaire, brief monthly questionnaire (postal or SMS), and 6-month follow-up questionnaire. An anonymised EMR audit described GP decision-making. Results GPs screened 3063 patients (intervention = 1591, control = 1472), completed the EMR template with 1237 eligible patients (intervention = 513, control = 724) and 524 participants (42%) consented to data collection (intervention = 231, control = 293). Recruitment took 28 weeks (target 12 weeks) with > 90% follow-up retention (target > 75%). We detected no selection bias of concern and no harms identified. GP stratification tool fidelity failed to achieve a-priori success criteria, whilst fidelity to the matched treatments achieved “complete success”. Conclusions A future definitive cluster RCT of stratified care for MSK pain is feasible and is underway, following key amendments including a clinician-completed version of the stratification tool and refinements to recommended matched treatments. Trial registration Name of the registry: ISRCTN. Trial registration number: 15366334. Date of registration: 06/04/2016.


Background
Musculoskeletal (MSK) pain from common conditions such as back pain and osteoarthritis are costly global health challenges, particularly for primary care where the majority of patients are managed. For example, in the UK, common MSK problems such as back, shoulder, knee and multi-site pain account for 14% of General Practitioner (GP) consultations [1] and estimates from the most recent global burden of disease studies suggest they are the leading cause of disability adjusted life years (DALYs) [2,3]. Given the ageing population and the increasingly complex and multi-morbid clinical presentations of patients, clinical decision-making is becoming more challenging [4][5][6].
In addition, consultation rates for MSK pain are increasing, for example in the UK, GP consultations for MSK pain have increased by 19% (from 310 to 370 million per year) over a five-year period [7,8].
Randomised controlled trials (RCTs) show that nonpharmacological interventions such as physiotherapistled supervised exercise and cognitive behavioural approaches are more effective than minimal usual care [9][10][11][12], yet most guidelines [13][14][15] lack clarity about which patients should be offered these additional interventions [16][17][18]. At present, primary care decisionmaking for MSK pain is mostly based on ruling out serious pathology and using clinical reasoning without formal stratification tools to decide on treatment. Assessing the severity, impact and prognosis of individual patients can be difficult in short primary care consultations and patient access to other treatments is often variable [19][20][21][22]. Offering everyone consulting in primary care with MSK pain further treatments is both unnecessary and impractical [16,17]. Therefore, finding ways to better identify which patients to de-medicalise by limiting care primarily to reassurance and selfmanagement whilst conversely identifying which patients should be offered more intensive and expensive healthcare treatments, is an international priority [14,17,23].
We have previously demonstrated the clinical-and cost-effectiveness of a stratified primary care approach to support clinical decision-making for patients with low back pain in the UK [24][25][26]. This approach combines prognostic stratification (using the STarT Back tool that classifies individuals into either a low, medium or high risk subgroup for persistent low back pain-related disability) with recommended matched treatments for each subgroup [27][28][29]. This approach to stratified care for low back pain has since been recommended in several international clinical guidelines [30][31][32]. Whilst low back pain is the most common MSK pain presentation in primary care, it accounts for only 26% of the MSK caseload [1], and it is unknown whether a similar prognostic approach to stratified care would benefit the large volume of patients with MSK pain in other body sites/locations (e.g. knee or shoulder pain).
Given the results of several systematic reviews showing consistent prognostic factors across MSK pain conditions [33][34][35][36][37], we developed and validated a single prognostic stratification tool, the Keele STarT MSK tool, for use among patients with the five most common MSK pain presentations in primary care (back, neck, shoulder, knee, and multi-site pain) [1]. The Keele STarT MSK Tool has shown good predictive and discriminative ability in development and validation samples [38], identifying patients at low, medium or high risk of persistent MSK pain over 6months. Using systematic review and consensus methods, we also agreed evidence-based recommended matched treatment options for each of the risk subgroups [39,40].
The STarT MSK stratified primary care intervention has two components: use of the tool to identify risk subgroups, followed by matched treatment options. A definitive trial is needed to test whether this approach is better for patients' outcomes and the healthcare system, compared to usual non-stratified care. Prior to conducting the main randomised controlled trial (RCT), we examined the feasibility of a) a future definitive cluster RCT, and b) GPs using stratified care at the point-ofconsultation. Specific objectives were to: 1) Estimate participant recruitment and follow-up rates in a pilot cluster RCT 2) Examine evidence of selection bias between trial arms and participants and non-participants 3) Assess GP fidelity to the stratified care intervention (use of the stratification tool and matched treatments) at the point-of-consultation. 4) Conduct secondary descriptive analyses of GP decision-making and patient self-reported outcomes.

Trial design
The study design was a pragmatic, feasibility and pilot, two-parallel arm (1:1 ratio), cluster RCT in 8 general practices, with a nested qualitative study reported separately [41]. A cluster RCT was chosen over an individual patient randomisation design as stratified MSK care involves GPs using a slightly different consultation approach following specific training, as well as the use of a bespoke electronic medical record (EMR) template, which was only possible to implement at a practice level without causing a high probability of intervention contamination across arms [42]. The units of randomisation were the general practices and units of observation were adults consulting with MSK pain. The International Standard Randomised Controlled Trials Number is ISRCTN15366334.

Participant eligibility criteria and identification
Patients were eligible if, during their visit to a participating GP practice, the trial's purpose-built participant identification screen, embedded within the EMR, was completed at the point-of-consultation, including GP confirmation of patient eligibility. Inclusion criteria were: aged over 18 years, registered at that general practice, consulting for MSK pain in the back, neck, shoulder, knee or multi-site pain. The trial identification template activated automatically for all new or returning episode cases when GPs (intervention and control) entered one of over 200 pre-identified MSK Read-codes (i.e. symptom/ diagnostic codes) into the patient's electronic medical record (EMR). Exclusions were: clinical indicators of (suspected) serious 'red flag' pathology requiring urgent medical intervention or a known systemic inflammatory condition, those unable to communicate in English (both in reading and speaking), vulnerable patients including those on the 'severe and enduring mental health register', a diagnosis of dementia or terminal illness, and recent trauma or bereavement. To reduce patient/clinician burden, the participant identification screen only activated once per patient (providing it was completed or an exclusion was entered). A further eligibility criterion, administrated by the research centre, specified that initial questionnaire responses were completed within 4 weeks of invitation mailing date (using self-reported date-of-completion on the questionnaire).

General practices
The UK West Midlands National Institute for Health Research (NIHR) Clinical Research Network (CRN) facilitated recruitment of eight general practices who used the EMIS Web EMR system and collectively served a target population of > 40,000 adults. GP practice eligibility criteria included willingness to be randomised to either stratified care or usual care, to engage in intervention training (if allocated to stratified care) and to facilitate an anonymised EMR audit after 6-months in the trial. Practices were also required to remove any existing MSK stratification tools (e.g. STarT Back) if they were randomised as a control practice. Consent to these criteria was sought through a written agreement with a representative from each participating practice, prior to randomisation. We aimed for practices that varied in size, location (urban, semi-urban and rural) and population socio-demographics.

Patients
Patient identification, invitation and recruitment were facilitated by CRN staff, or practice staff (if preferred), through a weekly download into a secure mailing database of eligible patients identified from the trial's IT identification template. Eligible patients were sent a study invitation letter and information leaflet, an initial questionnaire and a consent form with a stamped addressed envelope to return. A study administrator (blind to GP practice allocation) was available for telephone support if required. Signed consent to provide questionnaire outcome data was obtained from all participants and NHS ethical approval gained (Reference: 16/EM/ 0257). Participant recruitment lasted 8 months (October 2016 to May 2017).

Randomisation and blinding
Randomisation used stratified block randomisation based on GP practice list size to allocate the 8 practices in a ratio of 1:1 (4 intervention, 4 control). Keele Clinical Trials Unit (CTU) computer-generated the random sequence and ensured concealment by providing each practice with an anonymised code. Allocation (at cluster and individual level) was shared with the study team (except for the trial statistician and outcome data collectors who were blinded until the analysis was finalised). Blinding for participating GPs was obviously not possible, however, patients were unaware of the RCT and the differences between consultations in intervention and control practices, and instead were informed about, and consented to, providing questionnaire data for a study investigating the Treatment of Aches and Pains (TAPs). These processes follow recommendations for cluster RCTs [42].

Usual care
Patients consulting at the four usual care general practices received clinical care as usual for MSK pain. Usual primary care is known to be variable [43][44][45]; for example, some patients may receive advice, prescriptions for medications and nothing more, some may be asked to return to the GP for follow-up assessment or treatment, whereas others may be referred to other services, including for tests and investigations, or treatment services such as physiotherapy, orthopaedics or pain clinics. As part of the trial's participant identification screen, GPs in control (and intervention) practices recorded patient's average MSK pain intensity (see outcomes section) and primary MSK pain site at the point-of-consultation on the study EMR identification template.

Stratified care intervention
The intervention development was based on the Medical Research Council's (MRC) framework for the design and evaluation of complex interventions [46]. To support GPs in intervention practices to deliver stratified care, we extended the trial point-of-consultation identification EMR template to also contain the prognostic stratification tool (a development version of the Keele STarT MSK tool)see Fig. 1 and recommended matched treatment options. The tool was developed and validated in UK General Practice to predict persistent pain and disability and allocate individuals into low, medium or high risk subgroups and is published elsewhere [38]. The recommended matched treatment options for each subgroup are provided in Fig. 2 and were developed through a systematic review and expert consensus process, described in detail elsewhere [39,40]. In brief, for patients at low risk the treatment options were restricted to supporting self-management and overthe-counter medication, discouraging unnecessary investigations or referral. For those at medium risk, they included referral to conservative non-pharmacological treatments (e.g. those offered by physiotherapists) and workplace assessment and advice, and for those at high risk, they included referral for corticosteroid injections specialist clinical services (including rheumatology, orthopaedics and pain clinics), and opioids. GP training (3-4 h) within intervention practices was facilitated by an experienced GP trainer (VC) and the lead author (JH) and included: the rationale for stratified care, how it differs from usual care, familiarisation with the EMR template and its fit within the consultation, as well as addressing any questions or concerns. GPs also received a training-update half-way through their recruitment period at which feedback data were shared about individual GP intervention fidelity, with peer-topeer comparisons and discussion.

Outcomes measures and analyses
The defined pre-specified measures and success criterion to address each pilot trial objective were as below, with no changes once the pilot commenced:

Objective 1
To examine the recruitment and retention rates of general practices we examined the numbers of expressions of interest, face-to-face introductory meetings and signed agreements to participate. To examine the recruitment and retention rates for individual participants we examined the numbers of: participant identification screen activations in the EMR (these were potentially eligible patients screened by the GP at the point-of-consultation) and completions (confirmed eligibility and therefore invited by post to participate), as well as the initial questionnaires returned with written consent to participate in data collection, and monthly and 6-month questionnaires returned. Questionnaire items were examined to identify missing items and any floor-or-ceiling effects. Means and/or medians, standard deviations were reported for all the participant self-reported measures. The pre-specified success criteria for this objective was that the trial participant identification screen would be activated in approximately 2000 consultations leading to a minimum of 500 participants participating in data collection within an expected 3-month recruitment period and a follow-up rate of > 75% with less than 5% missing items in participant questionnaires.

Objective 2
To examine evidence of recruitment selection bias we descriptively analysed (means and standard deviations (SD)) the characteristics of intervention and control arm participants, and characteristics of trial participants and non-participants, using information from the EMR participant identification screen at the point-of-consultation (i.e. MSK pain location, pain intensity, age, sex and deprivation score) and within the participant self-reported initial questionnaire (demographic and clinical characteristics, as listed in Additional file 1). The pre-specified success criteria for this objective was to find little evidence of recruitment selection bias either between intervention and control participants, and between study participants and non-participants.

Objective 3
To assess GP fidelity to the stratified care intervention at the point-of-consultation we examined the proportion of eligible cases in which GPs used the stratification tool and choose at least one of the recommended matched treatments. Per protocol matched treatments for each subgroup were defined as follows: -Low risk: must only have low risk treatment options reported in the EMR -Medium risk: must have at least one medium risk treatment option and none of the high risk options reported in the EMR -High risk: patients must have reported within the EMR, at least one high risk treatment option, or a referral to an MSK service providing a medium risk treatment option (e.g. physiotherapy or psychological intervention) with tool subgroup information within their referral so that services were aware that an onward referral to a high risk treatment option might be required.
The pre-specified success criteria for this objective were that within relevant MSK pain consultations intervention GPs would: 1. Complete the prognostic stratification tool in: -> 50% of cases: "Complete success" (proceed to main trial without amendments) -40-50% of cases: "Partial success" (proceed to main trial with amendments) -< 40% of cases: "Unsuccessful" (consider whether or not to proceed to main trial) 2. Adhere to per protocol matched treatment options in: -> 65% of cases: "Complete success" (proceed to main trial without amendments) -50-65% of cases: "Partial success" (proceed to main trial with amendments) -< 50% of cases: "Unsuccessful" (consider whether or not to proceed to main trial)

Objective 4
To examine differences in GP decision-making and patient self-reported outcomes at the level of intervention and control we conducted secondary descriptive statistical analyses using the anonymised 6-month EMR audit and follow-up questionnaire data. As this was a feasibility and pilot trial the objective was not hypothesis testing of process/health outcomes, there were no pre-specified success criteria and only complete cases were analysed. There were four sources of data: 1. The GP EMR participant identification screen collected identical point-of-consultation data in all 8 GP practices, including the primary MSK pain site/location and average pain intensity (intended primary outcome for the main trial) by asking: How intense was your pain, on average, over the last 2 weeks? [Responses on a 0-10 scale, where 0 is "no pain" and 10 is "worst pain ever"].
Pain intensity was chosen as the potential primary outcome for the future main trial as it had the strongest face validity with patients during a pre-pilot Patient and Public Involvement and Engagement (PPIE) workshop and is also a recommended outcome for trials testing treatments for MSK pain [47,48]. In the intervention practices the EMR participant identification screen was extended to embed the stratified care intervention and collect additional data relating to stratification tool item responses and the matched treatment options chosen at the point-of-consultation. All template responses were date stamped and linked to an individual GP and patient. It was also possible from the EMR screen to collect automated data on the MSK consulter's age, sex and English Index of Multiple Deprivation (IMD) 2015 [49], with nonparticipants data anonymised first.

Baseline and 6-month postal questionnaires
included self-reported measures for average pain intensity over the last 2 weeks (identical wording and responses to the trial identification template), physical function measures for each of the MSK pain sites (filtered according to GP designation) including the back specific Roland-Morris Disability Questionnaire (RMDQ) [50], the Neck Disability Index (NDI) [51,52] the Shoulder Pain And Disability Index (SPADI) [53], the Knee Injury and Osteoarthritis Outcome Score Physical Function Short-form (KOOS-PS) [54] and for multi-site pain, the Short Form 12 (v2) Physical Component Scale [55]. Other outcomes were MSK risk status using the development version of the Keele STarT MSK tool [38], overall MSK health status using the Musculoskeletal Health Questionnaire [56], fear avoidance beliefs using the 11-item Tampa Scale of Kinesiophobia [57], patient perceived reassurance (from their GP) using the Effective Consultation and Reassurance Questionnaire (ECRQ) [58] (which has four subscales: information gathering, relationship building, generic reassurance and cognitive reassurance), health-related quality of life using the EuroQol five-dimension, five-level version (EQ-5D-5 L) [59], single items each capturing satisfaction with care received, whether participants had received written education material from their GP about their MSK problem (yes/ no), and overall rating of global change (− 5 to + 5 numerical response scale) since their index GP visit (the one in which the trial EMR screen was activated and they were invited to participate in the study data collection) [60], whether they were in paid employment and had taken any work absence due to their MSK pain, and an item asking how their productivity at work is affected (0-10 NRS). Patient population descriptors (captured at baseline alone) included; the Single Item Health Literacy Screener (SILS) [61] and pain episode duration by asking "how long is it since you had a whole month without [insert pain site e.g. back] pain". Additional file 1 provides a summary of the self-reported measures collected.

Monthly follow-up
Three items were collected using monthly follow-up via Short Message System (SMS) text or one-page postal questionnaire (depending on participant preference): average pain intensity (same wording as GP EMR screen), distress due to pain, and pain selfefficacy using: How much distress have you been experiencing because of your pain, on average, over the last 2 weeks? [Responses from 0 = no distress to 10 = extreme distress] How confident have you felt about managing your pain by yourself e.g. medication, changing lifestyle? [Responses from 0 = not at all confident to 10 = extremely confident]

Anonymised GP medical record audit
An anonymised audit of medical record data from all 8 GP practices for patients in whom the trial EMR participant identification screen had been completed, including: i) prescriptions (categorised into simple analgesics, non-steroidal anti-inflammatories (NSAIDs), neuromodulators, muscle relaxants, corticosteroid injections and opioids) ii) referrals (categorised into physiotherapy/MSK interface services, secondary care specialist services including orthopaedics, pain clinics, and rheumatology) iii) imaging (categorised into x-rays/MRI scans, MSK ultrasound scans and bone density scans) iv) sick certifications or 'fit-notes' (categorised into number per patient and mean length in days) v) repeat MSK general practice consultations.

Sample size
Whilst sample size calculations for pilot cluster trials are known to be difficult [62], the initial plan was to carry out an internal pilot trial with a 3-month recruitment phase, that mirrored the methods of the main cluster trial but was limited to assessing feasibility within 8 GP practices (4 intervention and 4 control) prior to involvement of a further 22 GP practices (30 in total). If the internal pilot had achieved its success criteria, we had planned that these 8 randomised practices would continue to recruit patients for a full 6-month period, and their data included in the main trial. Hence, we anticipated recruiting 500 patients from the 8 practices over the first 3-months in the internal pilot trial, with a further 500 participants to be recruited from those practices (and in addition 2750 from a further 22 practices for the main trial phase).

Objective 1: general practice and participant recruitment and retention rates
There were 32 general practices who expressed an initial interest in participating in the pilot trial from the West Midlands region of England, of which 16 agreed to a face-to-face introductory meeting with the research team, and 8 were recruited (with written agreements) and randomised (4 intervention, 4 control). The reasons given for declining participation included the practice lacking capacity in terms of resource at that particular time (n = 2), unwillingness to participate in the training session (n = 2), unwilling to use the EMR participant identification screen (n = 2), being already involved with another MSK pain research study (n = 1), and a perception that the practice's patient population would struggle to respond to the self-report questionnaires (n = 1). The 8 participating practices had a total adult practice population size of 58,307 (25,697 intervention, 32,610 control). The smallest practice had 3 GPs and a registered adult population of 3992; the largest had 9 GPs and 13,359 adult patients. In total 59 GPs identified patients for the trial (39 in control practices and 20 in intervention practices).
Patient recruitment and follow-up through the trial are described in Fig. 3. Recruitment started on 11/10/ 2016 and the last practice template was deactivated on 24/05/2017 with the last invite reminder sent on 21/06/ 2017 and last patient provided consent to data collection on 21/07/2017. There were 3063 potentially eligible patients screened by GPs at the point-of-consultation, the EMR participant identification screen was completed in 1281 with confirmed eligibility, of whom 1237 were actually invited by postal letter to participate in data collection, 567 initial questionnaires returned with written consent to participate in data collection, and 524 responses were received within the 4-week eligibility time-period (231 intervention and 293 controls). To recruit 500 patients took 28 weeks, more than twice as long as the original estimate (12 weeks). Recruitment varied substantially between the 8 practices (range n = 11-127) suggesting the need to account for this variation within the main trial sample size calculation. Once 500 participants were recruited, the EMR participant identification screen in practices was switched off, however, we recruited a further 24 participants (n = 524 in total) over the following month (33 weeks in total) due to the time lag in sending invitations and receiving patient consent to data collection (via the post).
The overall participant 6-month follow-up rate for the intended future RCT primary outcome of pain intensity was 477/524 (91.0%); usual care 209/231 (90.4%), intervention 268/293 (91.4%). Response rates for monthly pain intensity scores at 5 or more time-points (max. Possible was 6) was 82.6%, with data for 3 time-points available in 91.8%. 15 patients withdrew over the 6 months follow-up period: 5 from intervention practices (2 due to illness/surgery/poor health, 1 due to moving house, and 2 did not want further contact about the study), and 10 from control practices (5 due to illness/ surgery/poor health, 1 had died (unrelated), 2 withdrew because they felt recovered, and 2 did not want further contact). There were no related, unexpected serious adverse events or harms reported. At 6-month followup patients reported 11 hospital admissions (5 intervention, 6 control) related to their MSK pain (e.g. knee replacement or shoulder surgery). Missing data items in the questionnaires remained less than 5%. Anonymised medical record data were available for 1281 patients (529 from intervention practices and 752 from control practices).
The success criteria for this objective (the template activated in approximately 2000 consultations leading to a minimum of 500 participants providing consent within an expected 3-month recruitment period and a follow-up rate of > 75% with less than 5% missing items in patient questionnaires) was only "partially successful", as although patient recruitment and retention were "successful", the timeline needed to recruit 500 patients was 28 rather than 12 weeks.
The learning/change needed ahead of the main trial included reducing the main trial sample size (following discussion with the independent Trial Steering Committee and funder) by removing the pre-specified subgroup analysis (at the risk-subgroup level) and instead powering the trial for the overall comparison between intervention and control arms. In addition, the main trial sample size was re-calculated based on the following: Firstly, the pilot recruitment rate showed that the template was completed in just under 40% of cases, and from the subsequent letter of invitation 40% returned their initial questionnaire and provided consent to participation in the data collection (on average, 60 patients per practice). A conservative estimate (50 patients per practice) was therefore used for the main trial. Secondly, the proportions expected within each of the three risk subgroups, as determined from the selfcomplete questionnaires, were revised based on the pilot trial findings, to: 32% low risk, 55% medium risk, 13% high risk. This was important as the trial was powered to detect superiority of stratified care in the medium and high risk subgroups, with an expected effect size of 0.20.
Thirdly, for GP cluster parameterisation, we made the following estimates, based primarily on previous guidelines, as pilot trial figures need to be viewed cautiously given the possible lack of precision [62]. For the main trial primary outcome (pain intensity) we have conservatively allowed for an intracluster correlation coefficient (ICC) of 0.01 based on a guideline from previous primary care trials [63] and the pilot trial ICC being considerably lower (0.004). Our main trial estimated coefficient of variation in recruitment per practice is also based on a guideline estimate of 0.65 [64] as well as the pilot being similar at 0.66. Our expected loss to followup across all time-points is conservatively estimated at 25%, which in the pilot was around 5%. Lastly, our repeated measures correlation is estimated using a guideline figure of 0.7 [65], which is conservative based on our pilot trial figure of 0.65. These factors combine to give a sample size inflation factor of × 2.3 (based on an average cluster size of about 50 participants per practice in 6 months). Correlation of data within 6 repeated measurements and correlation of follow-up scores with baseline score are typically 0.7 and 0.5, respectively which combine to give a sample size deflation factor of × 0.5). The product of inflation and deflation effects result in a magnification of 1.15 compared to a conventional, individual-patient, single follow-up comparison, whereby the sample size requirement would be 525 per treatment arm (or, 1050 in total). The adjusted sample size target for the main trial was is therefore 600 patients per arm (1200 in total) from approx. 24 general practices (approx. 12 per arm).
Objective 2: to examine evidence of selection bias Table 1 shows a descriptive evaluation of individual participant demographics and characteristics (split by trial arm) and participants and non-participants. Whilst most characteristics were similar (e.g. sex) between intervention and control arms suggesting minimal selection bias, there were a few differences between participants (e.g. overall, they were slightly older and from more deprived areas) and non-participants. Mean pain intensity (0-10 Numerical Response Scale (NRS)) at the point-of-consultation was similar between participants (6.33, SD 2.05) and non-participants (6.35, SD 2.10), but pain scores were 0.5 points higher in participants in the intervention arm than control, although this difference had disappeared by the time of the initial patient questionnaire (typically 1-3 weeks later).
Overall there were few differences across other characteristics and the pre-specified success criteria for this objective of finding little evidence of selection bias was judged "successful". There were, therefore, no changes required to recruitment procedures for the main trial.

Objective 3: assessing GP fidelity to the stratified care intervention
GPs from intervention practices used the stratification tool within the EMR in 513/1591 (32%) of eligible patients, which was "unsuccessful" according to our prespecified success criteria. GP fidelity to choosing recommended matched treatment options (shown in Table 2)  Pain interference with performance at work (0-10, the higher score the worse), mean (SD   Low risk -given High treatments 3 2% Medium riskgiven Low treatments 0 0% Medium risk -given High treatment 5 2% High riskgiven Low treatments only 3 7% High riskgiven Medium treatments 0 0% Low riskonly tool used (no treatments selected) 25 16% Med riskonly tool used (no treatments selected) 19 8% High riskonly tool used (no treatments selected) 3 7%

Grand Total 430
achieved "complete success" with 81% of patients at low risk, 89% for medium risk and 87% for patients at high risk being correctly matched to a recommended treatment. Through the nested qualitative research (reported separately, [41]) and feedback discussions with the participating GPs about the reasons for the low rate of completion of the tool, we gathered a number of insights to inform the main trial. Firstly, GPs perceived that the using the whole EMR template increased their consultation workload and asked for the treatment options to be simplified. They also reported that the stratified care intervention was only appropriate for consultations where MSK pain was the primary reason for the consultation, where they could focus on the MSK pain problem. GPs also admitted that patients had frequently left the consultation room before they used the EMR and that they did not use the tool when their clinics were very busy. We therefore agreed in the future main trial to lower the expected proportion of MSK related consultations in which the tool would be used at the point-ofconsultation from 50 to 25%. We also identified that some GPs rarely coded MSK pain consultations and that others tended to use 'Synonym' codes, which are set of diagnostic codes that needed to be removed from the list of codes used to activate the EMR participant identification screen, as they caused it to activate in error for a range of non-MSK pain problems (e.g. chest pain). It was agreed that for the main trial the GP training needed to include ways to mitigate these issues. GPs also recommended reducing the 4 h of intervention training to 2 h and to provide a dedicated NHS physiotherapy pathway for patients in the main trial to overcome GPs' concerns about over-loading physiotherapy services with patients with MSK pain. Finally, GPs reported feeling uncomfortable with the self-report style wording of the development version of Keele STarT MSK tool. For example, they felt certain items could be modified to be less 'clunky and awkward' to ask (e.g. item 4: "Do you have any other important health problems?" which confused/unsettled patients when asked by their own family doctor who they expected to know their health problems well). We therefore developed a clinician-completed version of the Keele STarT MSK tool for use in the main trial, to overcome these wording problems, but keeping the item constructs as similar as possible. A license to obtain both the original self-report and clinician completed versions of the tool is available on request at www.keele.ac.uk/startmsk.

Objective 4: describing GP decision-making and patient outcomes in both arms
The results from the EMR audit of GP decision-making in MSK consultations are shown in Table 3 (split by intervention and control). GPs in intervention practices prescribed less opioids and more over-the-counter medication and anti-inflammatories than GPs in control practices. In addition, they gave more written self-management information to patients, used less MSK-related imaging and referred patients to physiotherapy earlier than in control practices. Numbers of corticosteroid injections, sick certifications, and repeat MSK pain related general practice consultations over 6 months were similar in intervention and control practices.
Descriptive data on patients' clinical outcomes over 6-months follow-up are presented in Table 4. Mean (SD) 6-month pain intensity was 3.93 (2.98) in participants in intervention practices and 4.18 (2.88) in control. Most other 6-month outcomes were similar although there was less MSK-related time-off-work in participants from intervention (17.4%) than control practices (25.4%). We did not statistically compare these outcomes in this pilot trial.

Discussion
This feasibility and pilot trial examined the feasibility of a future definitive cluster RCT in respect to recruitment and retention rates, potential selection bias and GP intervention fidelity to stratified care at the point-of-consultation for adults with MSK pain.
Our original plan was that this study was an internal feasibility and pilot trial. Our findings showed that participant retention rates were high, that GPs matched patients to recommended treatment options well (> 80% of cases), and there was little evidence of selection bias, therefore the cluster trial design was deemed suitable for the future main trial. However, the length of time taken to recruit participants was over twice as long as expected (28 rather than 12 weeks), and GPs completed the Keele STarT MSK Tool in fewer patient cases than we had hoped for (they used it in 32% of patient cases when the target was > 50%). The nested qualitative study findings [41] and feedback discussions with participating GPs explored the reasons why only two of the four prespecified pilot trial success criteria were met. These identified in the particular challenge of using the EMR template and stratified care intervention when MSK pain was not the primary reason for the consultation.
GPs also suggested a number of positive changes to make prior to the future definitive RCT and thus this study became an external pilot trial. These changes included simplifying the recommended treatment options and developing a clinician-completed version of the Keele STarT MSK Tool. Furthermore, we agreed to lower the expected proportion of MSK consultations in whom the tool would be used from 50 to 25% as we were unable to stop the EMR template from firing in consultations where MSK pain was a multimorbidity and not the main focus of the consultation. We also agreed to give GPs training specifically about the issue with 'Synonym' codes that failed to activate the EMR participant identification screen and reduced the intervention GP training from 4 h to 2 h. Lastly, we organised for NHS physiotherapy services receiving patients from participating intervention practices to provide a dedicated pathway for patients in the main trial. This pathway was put in place to overcome GPs' concerns about their referrals over-loading NHS physiotherapy services with patients with MSK pain and Table 3 Comparison of GP decision-making between intervention and control practices. †STarT MSK scored 0-3, low risk; 4-7 medium risk; 8-9 high risk. The colours represent the effects of the intervention on GP behaviours in comparison to controls: Reduced (> 0.04) Same Increased (> 0.04) Provided earlier . "It should be noted that the numbers of patients referred for an x-ray or MRI are combined, as in both the intervention and control GP practices, MRI was used less than 5 times in total, which meant there were too few numbers for any meaningful comparison of MRI alone." we specified that is was strictly not allowed to increase the speed of access to physiotherapy treatment for intervention participants.
The main STarT MSK trial is currently ongoing (ISRCTN15366334).

Conclusions
This feasibility and pilot trial has successfully demonstrated the feasibility of the cluster RCT design with high retention rates over 6 months (> 90%) and little evidence of selection bias, although changes to the main trial sample size were required due to a slower than expected recruitment rate. GP point-ofconsultation fidelity to the stratified care intervention was mixed with GPs using the tool less often than expected (only when they coded consultations, when they had time and when MSK pain was the primary reason for the visit). However, there was high fidelity to choosing recommended matched treatment options (> 80% of cases). The learning from this feasibility and pilot RCT has led to a number of important changes prior to the main STarT MSK trial testing the clinical and cost-effectiveness of stratified primary care for patients with MSK pain. Table 4 Clinical outcome measures at 6-month follow-up by intervention arm