Endoscopy 2019; 51(11): 1017-1026
DOI: 10.1055/a-0991-0044
Original article
© Georg Thieme Verlag KG Stuttgart · New York

ERCP assessment tool: evidence of validity and competency development during training

Keith Siau
1   Joint Advisory Group on Gastrointestinal Endoscopy, Royal College of Physicians, London, United Kingdom
2   College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
3   Endoscopy Unit, Dudley Group NHS Foundation Trust, Dudley, United Kingdom
,
Paul Dunckley
1   Joint Advisory Group on Gastrointestinal Endoscopy, Royal College of Physicians, London, United Kingdom
4   Department of Gastroenterology, Gloucestershire Hospitals NHS Foundation Trust, Gloucester, United Kingdom
,
Mark Feeney
1   Joint Advisory Group on Gastrointestinal Endoscopy, Royal College of Physicians, London, United Kingdom
5   Department of Gastroenterology, Torbay and South Devon NHS Foundation Trust, Torquay, United Kingdom
,
Gavin Johnson
1   Joint Advisory Group on Gastrointestinal Endoscopy, Royal College of Physicians, London, United Kingdom
6   Department of Gastroenterology, University College London Hospitals NHS Foundation Trust, London, United Kingdom
,
on behalf of the Joint Advisory Group on Gastrointestinal Endoscopy› Author Affiliations
Further Information

Corresponding author

Keith Siau, MBChB MRCP
Endoscopy Unit
Dudley Group NHS Foundation Trust
Pensnett Road
Dudley DY1 2HQ
United Kingdom   
Fax: +44-1384-244262   

Publication History

submitted 14 March 2019

accepted after revision 08 July 2019

Publication Date:
10 September 2019 (online)

 

Abstract

Background The endoscopic retrograde cholangiopancreatography (ERCP) direct observation of procedural skills (DOPS) is a 27-item competency assessment tool that was developed to support UK ERCP training. We evaluated validity of ERCP DOPS and competency development during training.

Methods This prospective study analyzed ERCP DOPS performed in the UK between July 2016 and October 2018. Reliability was measured using Cronbach’s alpha, and DOPS scores were benchmarked using the contrasting groups method. The percentage of competent scores was averaged for each item, domain, and overall rating, and stratified by lifetime procedure count to evaluate learning curves. Multivariable analyses were performed to identify predictors of DOPS competence.

Results 818 DOPS (109 trainees, 80 UK centers) were analyzed. Overall Cronbach’s alpha was 0.961. Attaining competency in 87 % of assessed DOPS items provided the optimal competency benchmark. This was achieved in the domain sequence of: pre-procedure, post-procedure management, endoscopic non-technical skills, cannulation & imaging, and execution of selected therapy, and across all items after 200 – 249 procedures (89 %). After 300 procedures, the benchmark was reached for selective cannulation (89 %), but not for stenting (plastic 73 %; metal 70 %), sphincterotomy (80 %), and sphincteroplasty (56 %). On multivariable analysis, lifetime procedure count (P = 0.002), easier case difficulty (P < 0.001), trainee grade (P = 0.03), and higher lifetime DOPS count (P = 0.01) were predictors of DOPS competence.

Conclusion This study provides novel validity, reliability, and learning curve data for ERCP DOPS. Trainees should have a minimum of 300 hands-on ERCP procedures before undertaking summative assessment for independent practice.


#

Introduction

Over the past four decades, endoscopic retrograde cholangiopancreatography (ERCP) has become a crucial therapeutic intervention for pancreaticobiliary disease. Of the widely performed therapeutic endoscopic procedures, ERCP carries the greatest risk of serious complications. To ensure safe and effective practice, training requires significant training time and experience of a large number of cases often afforded only within dedicated training programs. High quality training, backed by valid competency assessment tools, are required to track in-training progression, direct performance-enhancing feedback, and inform readiness for independent practice [1]. These aspects are key for the quality assurance of training for patients, employers, and trainees.

In the UK, quality assurance of endoscopy training is overseen by the Joint Advisory Group on Gastrointestinal Endoscopy (JAG) [2]. The formative direct observation of procedural skills (DOPS) for ERCP (see Fig. 1s in the online-only supplementary material) was introduced by JAG in 2016 to aid training and record competency progression [3]. The assessment instrument was developed with multidisciplinary consensus and follows the standard DOPS format of task deconstruction into individual competencies (items), accompanied by descriptors detailing examples of competent practice for each item. The DOPS contains 27 assessable items covering six performance domains. These domains comprise pre-procedure planning, three technical procedural domains (intubation, cannulation and imaging, execution of selected therapy), post-procedure planning, and generic endoscopic non-technical skills (ENTS). In line with other DOPS, assessors rate items on a four-point supervision-based scale and provide an overall rating independent of individual item scores [4] [5].

Formative assessment tools can be used to supplement key performance indicators to evaluate competency development within a national training cohort [1]. Despite their implementation into UK training, the ERCP DOPS has not been validated. Ensuring the validity and reliability of assessment tools is required by training programs to set valid competency benchmarks and monitor competency progression, by trainers to set training goals, and by trainees to compare performance in relation to their peers.

In this national study of ERCP DOPS assessments, we aimed to present validity evidence for ERCP DOPS. Specifically, the following were evaluated:

  1. Internal structure validity: whether the DOPS is capable of measuring specific and overall competencies in ERCP.

  2. Consequential validity: determining optimal competency thresholds for DOPS with regard to overall competence to facilitate benchmarking.

  3. Discriminative validity: measuring DOPS performance by lifetime procedure count to provide insights into the learning curves of specific ERCP competencies.


#

Methods

Study design

This was a prospective, observational, pan-UK evaluation of formative (in-training) ERCP DOPS assessments submitted onto the JETS e-portfolio between July 2016 and October 2018. The JETS e-portfolio is mandated for use by all UK endoscopy trainees to evidence endoscopy experience in the form of self-reported procedure entries, which permit the calculation of lifetime ERCP procedure counts, and objective assessments (DOPS). Formative DOPS may be undertaken at the discretion of the trainee at any stage of training, but under JAG guidance, are pre-specified prior to commencing the observed procedure to minimize case-selection bias. All DOPS data were retrieved from the JETS e-portfolio, and included: individual item scores, case difficulty, assessor’s overall rating, trainee and assessor unique identifier, trainee grade (ranging from specialty trainee year 3 [ST3: first year of endoscopy training] to consultant), and the lifetime procedure count preceding the DOPS assessment.


#

Validity framework

Validity of the ERCP DOPS was appraised using the American Psychological Association Standards framework, which proposes five sources of validity evidence in assessment tools [6] [7]:

  • content (relevance)

  • response process (relationship between intended construct and thought processes of assessors)

  • internal structure (associations between test measures, e. g. reliability)

  • relations with other variables (discriminative ability)

  • consequences (implications of interpreting test scores).


#

Outcomes

The primary outcome was the assessor’s overall competency rating. This was analyzed on a four-point scale, ranging from requiring maximal supervision (Score 1), significant supervision (Score 2), minimal supervision (Score 3), to competent without supervision (Score 4). Outcomes were also studied as a binary outcome (i. e. competent – Score 4) vs. not competent (i. e. requiring any degree of supervision – Scores 1 – 3). The percentage of items rated competent was analyzed per DOPS, across each competency item and domain. All DOPS received an overall competency rating, but items could be rated not assessable. Items rated not assessable were excluded from analyses.


#

Statistical analyses

Internal structure

Evidence of internal structure validity may be provided by demonstrating dimensionality and reliability [8]. Item-global correlation analyses were conducted using a bifactor model (Spearman’s rank method) to demonstrate dimensionality (i. e. whether differences exist in the correlations between individual item scores and the overall competency rating for each DOPS). The Spearman’s rho coefficient threshold of > 0.70 was used to denote a strongly positive correlation [9]. The internal consistency (reliability) of the DOPS scale was studied across DOPS and for each domain using Cronbach’s alpha. Internal consistency estimates the reliability of test components and indicates how well a set of items within an assessment measures a particular characteristic within the test (i. e. competency), with coefficients of > 0.90 indicating high reliability [10]. Given the nature of case-to-case variation in ERCP, it is not necessary or possible for assessors to rate all items within each DOPS, and so sensitivity analyses were performed to exclude items that were frequently (i. e. > 50 %) rated not assessable.


#

Consequential validity

The contrasting groups method ([Fig. 1]) was used to calculate the optimal competency threshold for ERCP DOPS [11]. Two distributions (competent vs. not competent) were plotted according to the overall assessor rating, with the percentage of competent items per DOPS on the x-axis and the frequency on the y-axis. The optimum competency threshold (benchmark) was indicated by the intercept of the two distributions. Consequence analysis was then performed by determining the theoretical false-positive and false-negative rates of applying this competency threshold.

Zoom Image
Fig. 1 Contrasting groups plots to determine competency thresholds for endoscopic retrograde cholangiopancreatography direct observation of procedural skills (DOPS). The percentage of items rated competent per DOPS (x-axis) are presented for the group with overall assessor ratings of competent (orange curve) and not competent (blue curve), with the frequency on the y-axis. The optimum competency threshold is indicated by the intercept of the two curves. False positives and false negatives are indicated by the green and red areas, respectively.

#

Discriminative validity (learning curves)

First, the frequency of competent DOPS scores was calculated at item level, domain level, and for the global DOPS scores (mean item DOPS score and assessor’s overall rating), and stratified by lifetime procedure count to estimate learning curves. For item competencies, 95 % confidence intervals (CIs) were calculated using the equal-tailed Jeffreys prior intervals approach. Next, to account for the variable number of DOPS performed by each trainee, a multivariable binary logistic regression analysis was performed using generalized estimating equations (autoregressive structure) to identify whether the effect of lifetime procedure count on overall DOPS competence remained significant after adjusting for potential confounders. This analysis adjusted for the following covariates: trainee specialty, trainee grade, existing JAG gastroscopy certification, formative DOPS count, and case difficulty.

Statistical analyses were performed by SPSS version 25 (IBM Corp, Armonk, New York, USA), with P < 0.05 indicative of significance.


#
#
#

Results

Study participants

In total, 818 DOPS submitted from 80 UK training centers were included for analysis. DOPS were completed by 123 trainers (median 3 DOPS per trainer, interquartile range [IQR] 1 – 6) for 109 trainees (95 gastroenterologists, 14 gastrointestinal surgeons; median 4 DOPS per trainee, IQR 1 – 9). A total of 716 DOPS were assessed by 111 gastroenterologists, 94 DOPS by 10 gastrointestinal surgeons, and 8 DOPS by 2 radiologists. Trainee grades comprised: ST3 – 5 (n = 28, 25.7 %), ST6 (n = 24, 22.0 %), ST7 – 8 (n = 23, 21.1 %), clinical research fellow (n = 8, 7.3 %), and consultant/associate specialist (n = 26, 23.9 %). Lifetime procedure counts are shown in [Table 1]; trainees undertaking DOPS had a preceding median lifetime procedure count of 70 (IQR 28 – 151, range 0 – 562).

Table 1

Baseline characteristics of the endoscopic retrograde cholangiopancreatography training cohort.

Lifetime procedure count[1]

No. of trainees, n (%)

No. of DOPS, n (%)

< 25

 25 (22.9)

 55 (6.7)

 25 – 49

 22 (20.2)

 84 (10.3)

 50 – 99

 18 (16.5)

135 (16.5)

100 – 149

 15 (13.8)

129 (15.8)

150 – 199

  9 (8.3)

133 (16.3)

200 – 249

  9 (8.3)

 74 (9.0)

250 – 299

  3 (2.8)

 90 (11.0)

300 + 

  8 (7.3)

118 (14.4)

Total

109

818

ERCP, endoscopic retrograde cholangiopancreatography; DOPS, direct observation of procedural skills.

1 The maximum lifetime ERCP procedure count preceding DOPSs for each unique trainee.



#

Internal structure validity

Item-global analyses

Item-global analyses were performed using the Spearman’s rank method in order to test internal structure validity by demonstrating whether differences exist in the scoring of individual DOPS items and the overall competency rating ([Table 2]). DOPS items corresponding to technical competencies were most likely to correlate with global competency. The highest correlations were observed for selective cannulation (rho 0.752), sphincterotomy (rho 0.768), sphincteroplasty (rho 0.814), and the ability to deploy a stent (plastic stent rho 0.705; metal stent rho 0.768). After the technical domains, these were followed by competencies in the ENTS domain items (rho 0.519 – 0.636), post-procedure domain items of report writing (rho 0.474) and management plan (rho 0.470). Correlations were weakest with preprocedural items (rho 0.230 – 0.273). All correlations were statistically significant (P  < 0.001).

Table 2

Internal structure evidence for endoscopic retrograde cholangiopancreatography direct observation of procedural skills, as evaluated using item-global correlations (Spearman’s rho) and domain-specific internal consistency estimates (Cronbach’s alpha statistic). Strong correlations (rho > 0.70) are highlighted.

DOPS domain and item

N

Spearman’s rho[1]

Cronbach’s alpha

Pre-procedure

  • Indication

786

0.273

0.930

  • Risk

786

0.251

  • Preparation

790

0.263

  • Equipment check

812

0.254

  • Consent

769

0.230

  • Sedation and monitoring

787

0.255

Intubation and positioning

  • Intubation

816

0.435

0.760

  • Visualization and position relative to ampulla

807

0.522

  • Patient comfort

789

0.281

Cannulation and imaging

  • Selective cannulation

775

0.752

0.861

  • Wire management

771

0.685

  • Radiological aspects

758

0.687

Execution of selected therapy

  • Decision about appropriate therapy

762

0.625

0.981

  • Sphincterotomy

536

0.768

  • Sphincteroplasty

127

0.814

  • Stone therapy

594

0.675

  • Tissue sampling

166

0.645

  • Stenting – plastic

215

0.705

  • Stenting – metal

154

0.768

  • Actions to minimize pancreatitis

634

0.480

  • Complications

194

0.565

Post-procedure

  • Report writing

662

0.474

0.893

  • Management plan

735

0.470

Endoscopic non-technical skills

  • Communication and teamwork

808

0.519

0.942

  • Situation awareness

798

0.567

  • Leadership

783

0.586

  • Judgment and decision making

792

0.636

DOPS, direct observation of procedural skills.

1 P values for all correlations < 0.001.



#

Reliability

Across the 27 DOPS items, only 5.4 % of DOPS were assessed in all eligible items. Internal consistency analysis yielded a Cronbach’s alpha statistic of 0.961. Five items (sphincteroplasty, tissue sampling, plastic stenting, metal stenting, and complications) were not assessed in > 50 % of DOPS and were excluded from sensitivity analysis. A total of 333 (40.7 %) DOPS met this criterion, which gave a Cronbach’s alpha statistic of 0.95, indicating a high degree of internal consistency (reliability). At domain level ([Table 2]), Cronbach’s alpha statistics ranged from 0.760 (intubation and positioning) to 0.981 (execution of selected therapy).


#
#

Consequential validity

Contrasting groups analyses ([Fig. 1]) demonstrated that the attainment of competence in 87 % of assessed items per DOPS provided the optimal competency benchmark (pass – fail threshold) in this cohort of trainees. The use of this cutoff score resulted in a theoretical false-positive rate of 10.4 % and a false-negative rate of 1.9 %. Based on these estimates and assuming independence of assessments, trainees who achieved the 87 % competence threshold in two consecutive DOPS would provide a false-positive rate of 1.1 % (i. e. 10.4 % × 10.4 %), whereas the corresponding value for 3 DOPS would essentially provide a false-positive rate of 0.11 % (i. e. P  = 0.001).


#

Learning curves

Competency development during ERCP training

Across the cohort, competency development by lifetime procedure count was presented at item level ([Table 3]) and domain/global level ([Fig. 2]). Competency acquisition occurred in the preprocedural domain, followed by post-procedural management, ENTS, cannulation, and imaging, followed by execution of selected therapy. Trainees achieved the 87 % competency threshold after 200 – 249 procedures (mean 89 %; 95 %CI 87 % – 93 %). Competency in selective cannulation (i. e. deep biliary cannulation), was achieved after 300 procedures (mean 89 %, 95 %CI 80 % – 95 %). Despite 300 + procedures, this competency threshold was not reached for the items of sphincteroplasty (56 %), stenting (plastic 73 %; metal 70 %), sphincterotomy (80 %), decision for appropriate therapy (80 %), and judgment and decision making (86 %).

Table 3

Endoscopic retrograde cholangiopancreatography direct observation of procedural skills item competency by lifetime procedure count. Procedure numbers required to achieve the competency threshold of 87 % + are highlighted.

Item

Percentage item competence by lifetime procedure count, (95 %CI)

 ≤ 25

25 – 49

50 – 99

100 – 149

150 – 199

200 – 249

250 – 299

300 + 

Indication

78 (69 – 85)

83 (76 – 88)

91 (85 – 94)

95 (89 – 98)

94 (86 – 98)

 95 (89 – 98)

 97 (89 – 99)

 91 (82 – 96)

Risk

78 (69 – 85)

85 (78 – 90)

90 (85 – 94)

96 (92 – 99)

94 (86 – 98)

 98 (92 – 100)

 98 (92 – 100)

 92 (84 – 97)

Preparation

79 (71 – 86)

85 (79 – 91)

92 (87 – 95)

96 (90 – 98)

97 (91 – 99)

 99 (94 – 100)

 98 (92 – 100)

100

Equipment check

79 (71 – 85)

88 (81 – 92)

94 (90 – 97)

96 (90 – 98)

97 (91 – 99)

100

 98 (92 – 100)

100

Consent

83 (75 – 89)

90 (84 – 94)

94 (90 – 97)

97 (93 – 99)

95 (88 – 99)

100

 98 (92 – 100)

 97 (91 – 99)

Sedation and monitoring

80 (72 – 87)

84 (77 – 89)

90 (85 – 94)

96 (91 – 99)

97 (91 – 99)

100

100

 99 (93 – 100)

Intubation

51 (42 – 60)

73 (66 – 80)

85 (79 – 90)

93 (87 – 97)

94 (87 – 98)

100

 95 (87 – 99)

 99 (93 – 100)

Visualization and position relative to ampulla

30 (23 – 39)

51 (43 – 59)

76 (69 – 82)

78 (69 – 85)

91 (83 – 96)

 95 (88 – 98)

 93 (84 – 98)

 92 (84 – 97)

Patient comfort

70 (61 – 78)

74 (67 – 81)

89 (84 – 93)

89 (82 – 94)

96 (89 – 99)

 99 (94 – 100)

 98 (92 – 100)

 96 (88 – 99)

Selective cannulation

20 (13 – 28)

32 (25 – 41)

42 (35 – 50)

63 (54 – 72)

81 (70 – 89)

 65 (54 – 75)

 68 (56 – 79)

 89 (80 – 95)

Wire management

17 (10 – 25)

32 (24 – 40)

52 (44 – 60)

62 (52 – 70)

69 (58 – 79)

 78 (68 – 86)

 75 (63 – 85)

 91 (82 – 96)

Radiological aspects

17 (11 – 26)

27 (20 – 35)

46 (38 – 54)

60 (50 – 69)

66 (54 – 77)

 79 (69 – 87)

 84 (73 – 92)

 88 (78 – 94)

Decision about appropriate therapy

25 (17 – 34)

34 (26 – 42)

51 (43 – 59)

64 (55 – 73)

70 (59 – 80)

 81 (72 – 89)

 75 (63 – 85)

 80 (70 – 89)

Sphincterotomy

21 (12 – 33)

20 (13 – 30)

30 (23 – 39)

63 (52 – 73)

63 (49 – 76)

 66 (53 – 77)

 69 (55 – 81)

 80 (67 – 90)

Sphincteroplasty

13 (3 – 34)

 7 (2 – 21)

30 (16 – 48)

69 (44 – 87)

13 (1 – 45)

 30 (9 – 61)

 60 (30 – 85)

 56 (25 – 83)

Stone therapy

21 (13 – 32)

30 (22 – 39)

46 (37 – 54)

64 (53 – 73)

64 (51 – 76)

 79 (67 – 88)

 79 (65 – 89)

 90 (80 – 96)

Tissue sampling

19 (7 – 39)

 6 (1 – 17)

37 (23 – 54)

67 (49 – 81)

63 (30 – 88)

 67 (39 – 88)

 73 (48 – 90)

 90 (62 – 99)

Stenting – plastic

22 (10 – 40)

14 (5 – 27)

34 (21 – 50)

62 (44 – 78)

68 (47 – 85)

 81 (63 – 92)

 62 (41 – 80)

 73 (48 – 90)

Stenting – metal

12 (3 – 33)

 6 (1 – 19)

26 (13 – 43)

63 (44 – 79)

40 (9 – 79)

 60 (35 – 81)

 47 (25 – 70)

 70 (39 – 91)

Actions to minimize pancreatitis

57 (45 – 68)

60 (51 – 70)

67 (58 – 75)

84 (76 – 90)

89 (79 – 95)

 97 (92 – 99)

 89 (79 – 95)

100

Complications

44 (25 – 64)

47 (32 – 63)

46 (29 – 62)

79 (60 – 92)

89 (59 – 99)

 90 (62 – 99)

 71 (46 – 90)

 91 (80 – 97)

Report writing

44 (33 – 55)

63 (54 – 71)

71 (62 – 78)

85 (76 – 91)

93 (83 – 97)

 92 (85 – 97)

 94 (85 – 98)

 97 (90 – 99)

Management plan

49 (39 – 59)

66 (58 – 74)

75 (68 – 82)

81 (73 – 88)

88 (79 – 95)

 95 (89 – 98)

 93 (84 – 98)

 94 (86 – 98)

Communication and teamwork

39 (30 – 48)

52 (44 – 60)

61 (53 – 68)

78 (70 – 85)

85 (76 – 92)

 98 (92 – 100)

 97 (89 – 99)

100

Situation awareness

32 (24 – 41)

45 (37 – 54)

57 (49 – 64)

74 (66 – 82)

87 (77 – 93)

 95 (89 – 98)

 93 (84 – 98)

 94 (86 – 98)

Leadership

28 (20 – 37)

37 (29 – 45)

55 (47 – 63)

69 (60 – 77)

80 (70 – 89)

 89 (81 – 94)

 90 (80 – 96)

 96 (88 – 99)

Judgment and decision making

23 (16 – 32)

28 (21 – 35)

50 (42 – 58)

65 (56 – 73)

68 (56 – 79)

 84 (75 – 91)

 88 (77 – 94)

 86 (77 – 93)

All DOPS Items

47 (42 – 52)

55 (50 – 59)

68 (64 – 72)

78 (73 – 83)

84 (80 – 88)

 89 (87 – 93)

 89 (84 – 94)

 93 (89 – 97)

CI, confidence interval; DOPS, direct observation of procedural skills.

Zoom Image
Fig. 2 Direct observation of procedural skills performance (percentage of items rated competent where assessed) by lifetime endoscopic retrograde cholangiopancreatography procedure count.

#

Predictors of DOPS competency

Multivariable analysis with generalized estimating equation models were performed to identify trainee-level predictors of overall DOPS competence ([Table 4]). This showed that lifetime procedure count (P = 0.002), trainee grade (P = 0.03), easier case difficulty (P < 0.001), and lifetime DOPS count ≥ 10 (P = 0.01) were associated with overall procedural competence, but not trainee specialty (P = 0.46), trainer specialty (P = 0.99), or prior gastroscopy certification (P = 0.97).

Table 4

Multivariable analysis of factors associated with direct observation of procedural skills (DOPS) competence (overall assessor score of 4) in formative gastroscopy DOPS. 

Factor

DOPS, n (%)

Multivariable OR

95 %CI

value[1]

Trainee specialty

  • Gastroenterology

702 (85.8)

REF

  • GI Surgeon

116 (14.2)

 0.648

 0.20 – 2.07

0.46

Grade

0.029

  • ST3 – 6

367 (44.9)

REF

  • ST7 – 8

132 (16.1)

 0.43

 0.14 – 1.32

0.14

  • Research Fellow

 35 (4.3)

 4.17

 0.66 – 26.44

0.13

  • Consultant/Associate Specialist

284 (34.7)

 2.82

 0.95 – 8.34

0.06

Assessor specialty

  • Gastroenterology

716 (87.5)

REF

  • Nongastroenterology

102 (12.5)

 1.01

 0.34 – 3.01

0.99

Lifetime procedure count

0.002

  •  < 25

124 (15.2)

REF

  • 25 – 49

145 (17.7)

 4.21

 0.97 – 18.21

0.06

  • 50 – 99

166 (20.3)

 5.22

 1.21 – 22.60

0.03

  • 100 – 149

111 (13.6)

15.86

 3.08 – 81.65

0.001

  • 150 – 199

 68 (8.3)

12.79

 2.51 – 65.09

0.002

  • 200 – 249

 81 (9.9)

10.66

 1.66 – 68.38

0.01

  • 249 – 300

 57 (7.0)

19.73

 3.09 – 126.21

0.002

  • 300 + 

 66 (8.1)

80.91

10.21 – 641.34

< 0.001

Case difficulty

< 0.001

  • Complicated

166 (20.3)

REF

  • Moderate

436 (53.3)

 2.30

 1.28 – 4.13

0.01

  • Easy

216 (26.4)

 4.82

 2.28 – 10.17

< 0.001

EGD certification

  • No

171 (20.9)

REF

  • Yes

647 (79.1)

 1.02

 0.29 – 3.58

0.97

Lifetime DOPS count

  •  < 10

404 (49.4)

REF

  • 10 + 

414 (50.6)

 2.62

 1.31 – 5.25

0.01

OR, odds ratio; CI, confidence interval; GI, gastrointestinal; EGD, esophagogastroduodenoscopy.

1 P values are significant at < 0.05.



#
#
#

Discussion

In the era of competency-based medical education, it is incumbent on training programs to ensure that there is a robust system in place to define and assess competence and to monitor and support competency development during endoscopy training. In this UK-wide study involving 109 ERCP trainees, the largest ERCP training cohort presented to date, we evaluated novel validity data from over a 2-year period to support the formative ERCP DOPS as an in-training competency assessment tool. Specifically, we have presented internal structure validity (item-global correlations and scale reliability), consequential validity (contrasting groups analyses), and discriminative validity (competency development profiles and multivariable analyses). We have therefore demonstrated that DOPS can be used to assess, benchmark, and monitor the progression of specific ERCP competencies during training.

The ERCP DOPS measures 27 competencies relevant to preprocedural, procedural, and post-procedural aspects of ERCP, in addition to ENTS. By interrogating its internal structure, item-global correlations were found to be lowest for competencies within the preprocedural domain and highest within the “execution of selected therapy” domain. These results follow the learning curve data, which show these to be the last to develop and is perhaps unsurprising given the therapeutic objective of ERCP and the technically demanding nature of these skills. The DOPS showed excellent reliability (Cronbach’s alpha 0.961), which persisted after sensitivity analyses. Domain-specific Cronbach’s alpha values exceeded 0.8 for all except intubation and positioning (0.760), which may be explained by the heterogeneity of the three included items (intubation, visualization and positioning, and patient comfort), and could be considered for refinement in future iterations of DOPS. These results objectively demonstrate that the DOPS instrument is capable of measuring different skills, which become congruent in measuring the overall outcome of competence in ERCP.

In the context of ERCP training, objective and standardized competency assessment tools are lacking. The EUS and ERCP Skills Assessment Tool (TEESAT) is the only other assessment tool with robust validity data [12] [13] and is advocated by the American Society of Gastrointestinal Endoscopy. TEESAT measures 19 technical items and 5 non-technical items. In contrast to DOPS, the TEESAT records procedural indications, previous interventions, application of specific pancreaticobiliary therapies, and procedural complications, but does not assess pre- or post-procedural competencies. The Rotterdam Assessment Form for ERCP (RAF-E) is a self-assessment tool that measures technical competencies on a visual analog scale [14] [15]. TEESAT data from 22 trainees (median 350 ERCP procedures) found that overall technical competence was achieved in 60 % (overall cannulation 68.4 %; stone clearance 85.7 %), and overall cognitive competence in 100 % [12]. Of note, these percentages were based on a less stringent definition of competence, where trainees were allowed verbal assistance, whereas competency in DOPS was defined as no verbal or physical assistance. Analyses of the RAF-E found that unassisted biliary cannulation rates reached 85 % after 200 procedures, and in 68 % after 180 procedures with a virgin papilla [15]. To our knowledge, the Ekkelenkamp study was the only other to stratify competency outcomes by lifetime procedure count, but was confined to 15 trainees from a single expert center, each with < 200 procedures [15], whereas studies by Wani et al. present competency rates for an entire cohort of dedicated fellows with variable procedure counts [12] [13].

Our learning curve data provide unique insights on the development of each of the DOPS competencies, stratified by lifetime procedure count, up to and beyond the 300 procedure mark. Trainees met the DOPS benchmark score of 87 % after 200 – 249 procedures, at which point 79 % of the DOPS were rated competent in stone therapy, thereby reaching the British Society of Gastroenterology (BSG) minimum standard of 75 % + [16]. Despite this, important technical end points such as selective cannulation (65 %), sphincterotomy (66 %), and sphincteroplasty (30 %), in addition to the non-technical competency of judgment and decision making (84 %) had not been achieved. The 85 % + BSG standard for competent biliary cannulation was achieved after 300 procedures (89 %), but not for sphincteroplasty, stenting, or therapeutic decision making. These results are in contrast to the learning curve data from Ekkelenkamp et al. [15], which show that competencies in stenting and sphincterotomy develop earlier than for selective cannulation. We believe this to arise from assessor interpretation of DOPS (i. e. response bias), whereby assessors may rate trainees as not being fully competent for a specific therapy if they were unable to complete the critical and preceding step of biliary cannulation. For the interests of time, trainers may partially undertake some of the subsequent therapeutic elements before handing the scope back to the trainee. Another observation was that competency in therapeutic decision making occurred later than selective cannulation. It is likely that trainers continue to provide advice even if the scope is in the hands of an experienced and technically competent trainee (i. e. the so-called “backseat driving”). Although the decision on a specific therapy (e. g. stent or stone extraction method), may be one that could be made by the trainee, this decision is often undertaken by the trainer on behalf of the trainee. These issues should be addressed in future ERCP Train-the-Trainers courses.

Collectively, our findings complement the single-operator data (USA), which reported 80 % deep biliary cannulation rates after 350 – 400 supervised procedures [17], and Japanese data, which associated lifetime ERCP counts of ≤ 300 with cannulation times of > 15 minutes (odds ratio [OR] 2.08; P < 0.001), a known risk factor for post-ERCP pancreatitis [18]. Worldwide, there is variation in the minimum procedural requirements before competency can be assessed, ranging from 200 supervised procedures (USA), 200 unassisted ERCPs (Australia and Canada), to 300 supervised procedures (UK) [19]. Our results support the 300 procedure minimum threshold recommended by the UK BSG [16], but inevitably reignite the debate on training requirements for independent practice. However, there are several caveats to our UK data that may affect generalizability to international training settings. Our results were inclusive of all UK trainees and not limited to those within dedicated fellowships or centers of excellence with high ERCP volume. Not all trainers were gastroenterologists, although DOPS competency rates did not vary by assessor specialty on multivariable analysis, which attests to its reliability. Furthermore, most procedures are usually performed under conscious sedation without propofol/anesthetic support, which is very likely to affect procedural difficulty and the need for earlier trainer intervention.

In the quality assurance of endoscopy training, thresholds for minimum procedure numbers should be considered in conjunction with objective, robust, and validated competence assessment tools, and other performance measures (e. g. key performance indicators). There have been calls by societies to shift emphasis away from minimum procedure counts in favor of competence assessment tools [20]. However, the outcome of competence is dependent on its definition, which may explain the heterogeneity in the ERCP training literature [21]. In our cohort, 20.3 % of assessed procedures were deemed complicated; these were significantly more likely to require trainer supervision compared with procedures of moderate (OR 2.30; P = 0.01) or easy (OR 4.82; P < 0.001) difficulty. As competency requires consistency of practice within a variety of cases, contexts, and difficulties [1], minimum procedure numbers are likely to remain as a competency safeguard. Within the confines of UK training, trainees are increasingly expected to enroll into dedicated ERCP fellowship programs and/or continue mentored ERCP practice after completion of specialist training [22]. This was reflected in our study, where 23.9 % of “trainees” were consultants or associate specialists. Training programs should, in principle, agree on the definition of competence, which usually pertains to a minimum standard regarded acceptable to patients and for unsupervised practice, with the expectation that further development would be required to achieve high quality performance standards, subject to performance monitoring during the newly independent period [1] [17] [23].

Several additional limitations should be discussed. First, DOPS assessments were performed by trainers rather than independent (neutral) assessors, which could introduce bias. Next, there were limitations related to the DOPS instrument; for example, papilla status had not been recorded, and has since been introduced. There were also a considerable number of items rated as not assessable. This was not due to missing data, but due to the varied nature of ERCP where the performance of certain therapeutic interventions was not required and so cannot be assessed. This impacted on Cronbach’s alpha estimates, which considers not assessable responses as invalid, hence the rationale for sensitivity analyses. Third, learning curves were plotted against lifetime procedure counts, which relied on self-entered data on the JETS e-portfolio and may risk entry bias. JETS e-portfolio procedure counts have previously been validated against those within endoscopy reporting system databases [24], and are subject to local verification by training leads. Most UK trainees will have engaged with JETS prior to ERCP training (e. g. for gastroscopy and/or colonoscopy certification), and as such, the lifetime procedure counts derived for ERCP are likely to be more accurate than for other procedures. Fourth, analyses were performed at the level of each DOPS, rather than the trainee-level cumulative summation approach due to variable procedure counts in between DOPS. Finally, our analyses were centered on DOPS assessments and not on objectively verified procedural level data and key performance indicators. This has limitations compared with TEESAT, which measures specific indications, completion end points, and complication rates. Although these aspects are recorded in the self-reported JETS e-portfolio entries for ERCP procedures (Fig. 2s), these are not yet automatically linked to DOPS. The UK National Endoscopy Database achieved roll-out in 2018 and is envisaged to support ERCP functionality in the future [25]. This would upload ERCP training procedures in real time to facilitate cross-linkage of procedural data with DOPS and to address the potential issue of entry bias.

Globally, no national credentialing/certification process exists for ERCP [26]. Instead, the privilege for independent practice remains at the discretion of local trainers and employers. With apprenticeship models being increasing phased out in favor of competency-based curricula, there are increasing calls by national societies to endorse a move towards certification [22] [26], in line with other modalities such as gastroscopy and colonoscopy [27]. Our results may inform efforts by national training bodies, such as JAG, to support the development of competency frameworks to quality assure training in ERCP.

Conclusion

In this study involving 109 ERCP trainees from 80 UK training centers, we have provided evidence of internal structure, consequential, and discriminative validity to support the ERCP DOPS as a novel competency assessment tool. The study has also shed light on competency acquisition in ERCP, with ≥ 300 procedures required for ≥ 85 % of the cohort to demonstrate competence in biliary cannulation, suggesting the minimum procedure requirement of 300 ERCPs as an eligibility criterion for triggering high-stakes assessment for independent practice.


#
#
#

Competing interests

All authors are associated with the Joint Advisory Group on Gastrointestinal Endoscopy Quality Assurance of Training Working Group.

Acknowledgments

The authors would like to acknowledge James Hodson from the Institute of Translational Medicine, Birmingham, UK, for statistical assistance and Professor Sauid Ishaq, Russell’s Hall Hospital, Dudley, UK, for critically reviewing the manuscript.

Figs. 1s, 2s


Corresponding author

Keith Siau, MBChB MRCP
Endoscopy Unit
Dudley Group NHS Foundation Trust
Pensnett Road
Dudley DY1 2HQ
United Kingdom   
Fax: +44-1384-244262   


Zoom Image
Fig. 1 Contrasting groups plots to determine competency thresholds for endoscopic retrograde cholangiopancreatography direct observation of procedural skills (DOPS). The percentage of items rated competent per DOPS (x-axis) are presented for the group with overall assessor ratings of competent (orange curve) and not competent (blue curve), with the frequency on the y-axis. The optimum competency threshold is indicated by the intercept of the two curves. False positives and false negatives are indicated by the green and red areas, respectively.
Zoom Image
Fig. 2 Direct observation of procedural skills performance (percentage of items rated competent where assessed) by lifetime endoscopic retrograde cholangiopancreatography procedure count.