Subscribe to RSS
DOI: 10.1055/a-0991-0044
ERCP assessment tool: evidence of validity and competency development during training
Corresponding author
Publication History
submitted 14 March 2019
accepted after revision 08 July 2019
Publication Date:
10 September 2019 (online)
Abstract
Background The endoscopic retrograde cholangiopancreatography (ERCP) direct observation of procedural skills (DOPS) is a 27-item competency assessment tool that was developed to support UK ERCP training. We evaluated validity of ERCP DOPS and competency development during training.
Methods This prospective study analyzed ERCP DOPS performed in the UK between July 2016 and October 2018. Reliability was measured using Cronbach’s alpha, and DOPS scores were benchmarked using the contrasting groups method. The percentage of competent scores was averaged for each item, domain, and overall rating, and stratified by lifetime procedure count to evaluate learning curves. Multivariable analyses were performed to identify predictors of DOPS competence.
Results 818 DOPS (109 trainees, 80 UK centers) were analyzed. Overall Cronbach’s alpha was 0.961. Attaining competency in 87 % of assessed DOPS items provided the optimal competency benchmark. This was achieved in the domain sequence of: pre-procedure, post-procedure management, endoscopic non-technical skills, cannulation & imaging, and execution of selected therapy, and across all items after 200 – 249 procedures (89 %). After 300 procedures, the benchmark was reached for selective cannulation (89 %), but not for stenting (plastic 73 %; metal 70 %), sphincterotomy (80 %), and sphincteroplasty (56 %). On multivariable analysis, lifetime procedure count (P = 0.002), easier case difficulty (P < 0.001), trainee grade (P = 0.03), and higher lifetime DOPS count (P = 0.01) were predictors of DOPS competence.
Conclusion This study provides novel validity, reliability, and learning curve data for ERCP DOPS. Trainees should have a minimum of 300 hands-on ERCP procedures before undertaking summative assessment for independent practice.
#
Introduction
Over the past four decades, endoscopic retrograde cholangiopancreatography (ERCP) has become a crucial therapeutic intervention for pancreaticobiliary disease. Of the widely performed therapeutic endoscopic procedures, ERCP carries the greatest risk of serious complications. To ensure safe and effective practice, training requires significant training time and experience of a large number of cases often afforded only within dedicated training programs. High quality training, backed by valid competency assessment tools, are required to track in-training progression, direct performance-enhancing feedback, and inform readiness for independent practice [1]. These aspects are key for the quality assurance of training for patients, employers, and trainees.
In the UK, quality assurance of endoscopy training is overseen by the Joint Advisory Group on Gastrointestinal Endoscopy (JAG) [2]. The formative direct observation of procedural skills (DOPS) for ERCP (see Fig. 1s in the online-only supplementary material) was introduced by JAG in 2016 to aid training and record competency progression [3]. The assessment instrument was developed with multidisciplinary consensus and follows the standard DOPS format of task deconstruction into individual competencies (items), accompanied by descriptors detailing examples of competent practice for each item. The DOPS contains 27 assessable items covering six performance domains. These domains comprise pre-procedure planning, three technical procedural domains (intubation, cannulation and imaging, execution of selected therapy), post-procedure planning, and generic endoscopic non-technical skills (ENTS). In line with other DOPS, assessors rate items on a four-point supervision-based scale and provide an overall rating independent of individual item scores [4] [5].
Formative assessment tools can be used to supplement key performance indicators to evaluate competency development within a national training cohort [1]. Despite their implementation into UK training, the ERCP DOPS has not been validated. Ensuring the validity and reliability of assessment tools is required by training programs to set valid competency benchmarks and monitor competency progression, by trainers to set training goals, and by trainees to compare performance in relation to their peers.
In this national study of ERCP DOPS assessments, we aimed to present validity evidence for ERCP DOPS. Specifically, the following were evaluated:
-
Internal structure validity: whether the DOPS is capable of measuring specific and overall competencies in ERCP.
-
Consequential validity: determining optimal competency thresholds for DOPS with regard to overall competence to facilitate benchmarking.
-
Discriminative validity: measuring DOPS performance by lifetime procedure count to provide insights into the learning curves of specific ERCP competencies.
#
Methods
Study design
This was a prospective, observational, pan-UK evaluation of formative (in-training) ERCP DOPS assessments submitted onto the JETS e-portfolio between July 2016 and October 2018. The JETS e-portfolio is mandated for use by all UK endoscopy trainees to evidence endoscopy experience in the form of self-reported procedure entries, which permit the calculation of lifetime ERCP procedure counts, and objective assessments (DOPS). Formative DOPS may be undertaken at the discretion of the trainee at any stage of training, but under JAG guidance, are pre-specified prior to commencing the observed procedure to minimize case-selection bias. All DOPS data were retrieved from the JETS e-portfolio, and included: individual item scores, case difficulty, assessor’s overall rating, trainee and assessor unique identifier, trainee grade (ranging from specialty trainee year 3 [ST3: first year of endoscopy training] to consultant), and the lifetime procedure count preceding the DOPS assessment.
#
Validity framework
Validity of the ERCP DOPS was appraised using the American Psychological Association Standards framework, which proposes five sources of validity evidence in assessment tools [6] [7]:
-
content (relevance)
-
response process (relationship between intended construct and thought processes of assessors)
-
internal structure (associations between test measures, e. g. reliability)
-
relations with other variables (discriminative ability)
-
consequences (implications of interpreting test scores).
#
Outcomes
The primary outcome was the assessor’s overall competency rating. This was analyzed on a four-point scale, ranging from requiring maximal supervision (Score 1), significant supervision (Score 2), minimal supervision (Score 3), to competent without supervision (Score 4). Outcomes were also studied as a binary outcome (i. e. competent – Score 4) vs. not competent (i. e. requiring any degree of supervision – Scores 1 – 3). The percentage of items rated competent was analyzed per DOPS, across each competency item and domain. All DOPS received an overall competency rating, but items could be rated not assessable. Items rated not assessable were excluded from analyses.
#
Statistical analyses
Internal structure
Evidence of internal structure validity may be provided by demonstrating dimensionality and reliability [8]. Item-global correlation analyses were conducted using a bifactor model (Spearman’s rank method) to demonstrate dimensionality (i. e. whether differences exist in the correlations between individual item scores and the overall competency rating for each DOPS). The Spearman’s rho coefficient threshold of > 0.70 was used to denote a strongly positive correlation [9]. The internal consistency (reliability) of the DOPS scale was studied across DOPS and for each domain using Cronbach’s alpha. Internal consistency estimates the reliability of test components and indicates how well a set of items within an assessment measures a particular characteristic within the test (i. e. competency), with coefficients of > 0.90 indicating high reliability [10]. Given the nature of case-to-case variation in ERCP, it is not necessary or possible for assessors to rate all items within each DOPS, and so sensitivity analyses were performed to exclude items that were frequently (i. e. > 50 %) rated not assessable.
#
Consequential validity
The contrasting groups method ([Fig. 1]) was used to calculate the optimal competency threshold for ERCP DOPS [11]. Two distributions (competent vs. not competent) were plotted according to the overall assessor rating, with the percentage of competent items per DOPS on the x-axis and the frequency on the y-axis. The optimum competency threshold (benchmark) was indicated by the intercept of the two distributions. Consequence analysis was then performed by determining the theoretical false-positive and false-negative rates of applying this competency threshold.


#
Discriminative validity (learning curves)
First, the frequency of competent DOPS scores was calculated at item level, domain level, and for the global DOPS scores (mean item DOPS score and assessor’s overall rating), and stratified by lifetime procedure count to estimate learning curves. For item competencies, 95 % confidence intervals (CIs) were calculated using the equal-tailed Jeffreys prior intervals approach. Next, to account for the variable number of DOPS performed by each trainee, a multivariable binary logistic regression analysis was performed using generalized estimating equations (autoregressive structure) to identify whether the effect of lifetime procedure count on overall DOPS competence remained significant after adjusting for potential confounders. This analysis adjusted for the following covariates: trainee specialty, trainee grade, existing JAG gastroscopy certification, formative DOPS count, and case difficulty.
Statistical analyses were performed by SPSS version 25 (IBM Corp, Armonk, New York, USA), with P < 0.05 indicative of significance.
#
#
#
Results
Study participants
In total, 818 DOPS submitted from 80 UK training centers were included for analysis. DOPS were completed by 123 trainers (median 3 DOPS per trainer, interquartile range [IQR] 1 – 6) for 109 trainees (95 gastroenterologists, 14 gastrointestinal surgeons; median 4 DOPS per trainee, IQR 1 – 9). A total of 716 DOPS were assessed by 111 gastroenterologists, 94 DOPS by 10 gastrointestinal surgeons, and 8 DOPS by 2 radiologists. Trainee grades comprised: ST3 – 5 (n = 28, 25.7 %), ST6 (n = 24, 22.0 %), ST7 – 8 (n = 23, 21.1 %), clinical research fellow (n = 8, 7.3 %), and consultant/associate specialist (n = 26, 23.9 %). Lifetime procedure counts are shown in [Table 1]; trainees undertaking DOPS had a preceding median lifetime procedure count of 70 (IQR 28 – 151, range 0 – 562).
Lifetime procedure count[1] |
No. of trainees, n (%) |
No. of DOPS, n (%) |
< 25 |
25 (22.9) |
55 (6.7) |
25 – 49 |
22 (20.2) |
84 (10.3) |
50 – 99 |
18 (16.5) |
135 (16.5) |
100 – 149 |
15 (13.8) |
129 (15.8) |
150 – 199 |
9 (8.3) |
133 (16.3) |
200 – 249 |
9 (8.3) |
74 (9.0) |
250 – 299 |
3 (2.8) |
90 (11.0) |
300 + |
8 (7.3) |
118 (14.4) |
Total |
109 |
818 |
ERCP, endoscopic retrograde cholangiopancreatography; DOPS, direct observation of procedural skills.
1 The maximum lifetime ERCP procedure count preceding DOPSs for each unique trainee.
#
Internal structure validity
Item-global analyses
Item-global analyses were performed using the Spearman’s rank method in order to test internal structure validity by demonstrating whether differences exist in the scoring of individual DOPS items and the overall competency rating ([Table 2]). DOPS items corresponding to technical competencies were most likely to correlate with global competency. The highest correlations were observed for selective cannulation (rho 0.752), sphincterotomy (rho 0.768), sphincteroplasty (rho 0.814), and the ability to deploy a stent (plastic stent rho 0.705; metal stent rho 0.768). After the technical domains, these were followed by competencies in the ENTS domain items (rho 0.519 – 0.636), post-procedure domain items of report writing (rho 0.474) and management plan (rho 0.470). Correlations were weakest with preprocedural items (rho 0.230 – 0.273). All correlations were statistically significant (P < 0.001).
DOPS domain and item |
N |
Spearman’s rho[1] |
Cronbach’s alpha |
Pre-procedure |
|||
|
786 |
0.273 |
0.930 |
|
786 |
0.251 |
|
|
790 |
0.263 |
|
|
812 |
0.254 |
|
|
769 |
0.230 |
|
|
787 |
0.255 |
|
Intubation and positioning |
|||
|
816 |
0.435 |
0.760 |
|
807 |
0.522 |
|
|
789 |
0.281 |
|
Cannulation and imaging |
|||
|
775 |
0.752 |
0.861 |
|
771 |
0.685 |
|
|
758 |
0.687 |
|
Execution of selected therapy |
|||
|
762 |
0.625 |
0.981 |
|
536 |
0.768 |
|
|
127 |
0.814 |
|
|
594 |
0.675 |
|
|
166 |
0.645 |
|
|
215 |
0.705 |
|
|
154 |
0.768 |
|
|
634 |
0.480 |
|
|
194 |
0.565 |
|
Post-procedure |
|||
|
662 |
0.474 |
0.893 |
|
735 |
0.470 |
|
Endoscopic non-technical skills |
|||
|
808 |
0.519 |
0.942 |
|
798 |
0.567 |
|
|
783 |
0.586 |
|
|
792 |
0.636 |
DOPS, direct observation of procedural skills.
1 P values for all correlations < 0.001.
#
Reliability
Across the 27 DOPS items, only 5.4 % of DOPS were assessed in all eligible items. Internal consistency analysis yielded a Cronbach’s alpha statistic of 0.961. Five items (sphincteroplasty, tissue sampling, plastic stenting, metal stenting, and complications) were not assessed in > 50 % of DOPS and were excluded from sensitivity analysis. A total of 333 (40.7 %) DOPS met this criterion, which gave a Cronbach’s alpha statistic of 0.95, indicating a high degree of internal consistency (reliability). At domain level ([Table 2]), Cronbach’s alpha statistics ranged from 0.760 (intubation and positioning) to 0.981 (execution of selected therapy).
#
#
Consequential validity
Contrasting groups analyses ([Fig. 1]) demonstrated that the attainment of competence in 87 % of assessed items per DOPS provided the optimal competency benchmark (pass – fail threshold) in this cohort of trainees. The use of this cutoff score resulted in a theoretical false-positive rate of 10.4 % and a false-negative rate of 1.9 %. Based on these estimates and assuming independence of assessments, trainees who achieved the 87 % competence threshold in two consecutive DOPS would provide a false-positive rate of 1.1 % (i. e. 10.4 % × 10.4 %), whereas the corresponding value for 3 DOPS would essentially provide a false-positive rate of 0.11 % (i. e. P = 0.001).
#
Learning curves
Competency development during ERCP training
Across the cohort, competency development by lifetime procedure count was presented at item level ([Table 3]) and domain/global level ([Fig. 2]). Competency acquisition occurred in the preprocedural domain, followed by post-procedural management, ENTS, cannulation, and imaging, followed by execution of selected therapy. Trainees achieved the 87 % competency threshold after 200 – 249 procedures (mean 89 %; 95 %CI 87 % – 93 %). Competency in selective cannulation (i. e. deep biliary cannulation), was achieved after 300 procedures (mean 89 %, 95 %CI 80 % – 95 %). Despite 300 + procedures, this competency threshold was not reached for the items of sphincteroplasty (56 %), stenting (plastic 73 %; metal 70 %), sphincterotomy (80 %), decision for appropriate therapy (80 %), and judgment and decision making (86 %).
CI, confidence interval; DOPS, direct observation of procedural skills.


#
Predictors of DOPS competency
Multivariable analysis with generalized estimating equation models were performed to identify trainee-level predictors of overall DOPS competence ([Table 4]). This showed that lifetime procedure count (P = 0.002), trainee grade (P = 0.03), easier case difficulty (P < 0.001), and lifetime DOPS count ≥ 10 (P = 0.01) were associated with overall procedural competence, but not trainee specialty (P = 0.46), trainer specialty (P = 0.99), or prior gastroscopy certification (P = 0.97).
Factor |
DOPS, n (%) |
Multivariable OR |
95 %CI |
P value[1] |
Trainee specialty |
||||
|
702 (85.8) |
REF |
||
|
116 (14.2) |
0.648 |
0.20 – 2.07 |
0.46 |
Grade |
0.029 |
|||
|
367 (44.9) |
REF |
||
|
132 (16.1) |
0.43 |
0.14 – 1.32 |
0.14 |
|
35 (4.3) |
4.17 |
0.66 – 26.44 |
0.13 |
|
284 (34.7) |
2.82 |
0.95 – 8.34 |
0.06 |
Assessor specialty |
||||
|
716 (87.5) |
REF |
||
|
102 (12.5) |
1.01 |
0.34 – 3.01 |
0.99 |
Lifetime procedure count |
0.002 |
|||
|
124 (15.2) |
REF |
||
|
145 (17.7) |
4.21 |
0.97 – 18.21 |
0.06 |
|
166 (20.3) |
5.22 |
1.21 – 22.60 |
0.03 |
|
111 (13.6) |
15.86 |
3.08 – 81.65 |
0.001 |
|
68 (8.3) |
12.79 |
2.51 – 65.09 |
0.002 |
|
81 (9.9) |
10.66 |
1.66 – 68.38 |
0.01 |
|
57 (7.0) |
19.73 |
3.09 – 126.21 |
0.002 |
|
66 (8.1) |
80.91 |
10.21 – 641.34 |
< 0.001 |
Case difficulty |
< 0.001 |
|||
|
166 (20.3) |
REF |
||
|
436 (53.3) |
2.30 |
1.28 – 4.13 |
0.01 |
|
216 (26.4) |
4.82 |
2.28 – 10.17 |
< 0.001 |
EGD certification |
||||
|
171 (20.9) |
REF |
||
|
647 (79.1) |
1.02 |
0.29 – 3.58 |
0.97 |
Lifetime DOPS count |
||||
|
404 (49.4) |
REF |
||
|
414 (50.6) |
2.62 |
1.31 – 5.25 |
0.01 |
OR, odds ratio; CI, confidence interval; GI, gastrointestinal; EGD, esophagogastroduodenoscopy.
1 P values are significant at < 0.05.
#
#
#
Discussion
In the era of competency-based medical education, it is incumbent on training programs to ensure that there is a robust system in place to define and assess competence and to monitor and support competency development during endoscopy training. In this UK-wide study involving 109 ERCP trainees, the largest ERCP training cohort presented to date, we evaluated novel validity data from over a 2-year period to support the formative ERCP DOPS as an in-training competency assessment tool. Specifically, we have presented internal structure validity (item-global correlations and scale reliability), consequential validity (contrasting groups analyses), and discriminative validity (competency development profiles and multivariable analyses). We have therefore demonstrated that DOPS can be used to assess, benchmark, and monitor the progression of specific ERCP competencies during training.
The ERCP DOPS measures 27 competencies relevant to preprocedural, procedural, and post-procedural aspects of ERCP, in addition to ENTS. By interrogating its internal structure, item-global correlations were found to be lowest for competencies within the preprocedural domain and highest within the “execution of selected therapy” domain. These results follow the learning curve data, which show these to be the last to develop and is perhaps unsurprising given the therapeutic objective of ERCP and the technically demanding nature of these skills. The DOPS showed excellent reliability (Cronbach’s alpha 0.961), which persisted after sensitivity analyses. Domain-specific Cronbach’s alpha values exceeded 0.8 for all except intubation and positioning (0.760), which may be explained by the heterogeneity of the three included items (intubation, visualization and positioning, and patient comfort), and could be considered for refinement in future iterations of DOPS. These results objectively demonstrate that the DOPS instrument is capable of measuring different skills, which become congruent in measuring the overall outcome of competence in ERCP.
In the context of ERCP training, objective and standardized competency assessment tools are lacking. The EUS and ERCP Skills Assessment Tool (TEESAT) is the only other assessment tool with robust validity data [12] [13] and is advocated by the American Society of Gastrointestinal Endoscopy. TEESAT measures 19 technical items and 5 non-technical items. In contrast to DOPS, the TEESAT records procedural indications, previous interventions, application of specific pancreaticobiliary therapies, and procedural complications, but does not assess pre- or post-procedural competencies. The Rotterdam Assessment Form for ERCP (RAF-E) is a self-assessment tool that measures technical competencies on a visual analog scale [14] [15]. TEESAT data from 22 trainees (median 350 ERCP procedures) found that overall technical competence was achieved in 60 % (overall cannulation 68.4 %; stone clearance 85.7 %), and overall cognitive competence in 100 % [12]. Of note, these percentages were based on a less stringent definition of competence, where trainees were allowed verbal assistance, whereas competency in DOPS was defined as no verbal or physical assistance. Analyses of the RAF-E found that unassisted biliary cannulation rates reached 85 % after 200 procedures, and in 68 % after 180 procedures with a virgin papilla [15]. To our knowledge, the Ekkelenkamp study was the only other to stratify competency outcomes by lifetime procedure count, but was confined to 15 trainees from a single expert center, each with < 200 procedures [15], whereas studies by Wani et al. present competency rates for an entire cohort of dedicated fellows with variable procedure counts [12] [13].
Our learning curve data provide unique insights on the development of each of the DOPS competencies, stratified by lifetime procedure count, up to and beyond the 300 procedure mark. Trainees met the DOPS benchmark score of 87 % after 200 – 249 procedures, at which point 79 % of the DOPS were rated competent in stone therapy, thereby reaching the British Society of Gastroenterology (BSG) minimum standard of 75 % + [16]. Despite this, important technical end points such as selective cannulation (65 %), sphincterotomy (66 %), and sphincteroplasty (30 %), in addition to the non-technical competency of judgment and decision making (84 %) had not been achieved. The 85 % + BSG standard for competent biliary cannulation was achieved after 300 procedures (89 %), but not for sphincteroplasty, stenting, or therapeutic decision making. These results are in contrast to the learning curve data from Ekkelenkamp et al. [15], which show that competencies in stenting and sphincterotomy develop earlier than for selective cannulation. We believe this to arise from assessor interpretation of DOPS (i. e. response bias), whereby assessors may rate trainees as not being fully competent for a specific therapy if they were unable to complete the critical and preceding step of biliary cannulation. For the interests of time, trainers may partially undertake some of the subsequent therapeutic elements before handing the scope back to the trainee. Another observation was that competency in therapeutic decision making occurred later than selective cannulation. It is likely that trainers continue to provide advice even if the scope is in the hands of an experienced and technically competent trainee (i. e. the so-called “backseat driving”). Although the decision on a specific therapy (e. g. stent or stone extraction method), may be one that could be made by the trainee, this decision is often undertaken by the trainer on behalf of the trainee. These issues should be addressed in future ERCP Train-the-Trainers courses.
Collectively, our findings complement the single-operator data (USA), which reported 80 % deep biliary cannulation rates after 350 – 400 supervised procedures [17], and Japanese data, which associated lifetime ERCP counts of ≤ 300 with cannulation times of > 15 minutes (odds ratio [OR] 2.08; P < 0.001), a known risk factor for post-ERCP pancreatitis [18]. Worldwide, there is variation in the minimum procedural requirements before competency can be assessed, ranging from 200 supervised procedures (USA), 200 unassisted ERCPs (Australia and Canada), to 300 supervised procedures (UK) [19]. Our results support the 300 procedure minimum threshold recommended by the UK BSG [16], but inevitably reignite the debate on training requirements for independent practice. However, there are several caveats to our UK data that may affect generalizability to international training settings. Our results were inclusive of all UK trainees and not limited to those within dedicated fellowships or centers of excellence with high ERCP volume. Not all trainers were gastroenterologists, although DOPS competency rates did not vary by assessor specialty on multivariable analysis, which attests to its reliability. Furthermore, most procedures are usually performed under conscious sedation without propofol/anesthetic support, which is very likely to affect procedural difficulty and the need for earlier trainer intervention.
In the quality assurance of endoscopy training, thresholds for minimum procedure numbers should be considered in conjunction with objective, robust, and validated competence assessment tools, and other performance measures (e. g. key performance indicators). There have been calls by societies to shift emphasis away from minimum procedure counts in favor of competence assessment tools [20]. However, the outcome of competence is dependent on its definition, which may explain the heterogeneity in the ERCP training literature [21]. In our cohort, 20.3 % of assessed procedures were deemed complicated; these were significantly more likely to require trainer supervision compared with procedures of moderate (OR 2.30; P = 0.01) or easy (OR 4.82; P < 0.001) difficulty. As competency requires consistency of practice within a variety of cases, contexts, and difficulties [1], minimum procedure numbers are likely to remain as a competency safeguard. Within the confines of UK training, trainees are increasingly expected to enroll into dedicated ERCP fellowship programs and/or continue mentored ERCP practice after completion of specialist training [22]. This was reflected in our study, where 23.9 % of “trainees” were consultants or associate specialists. Training programs should, in principle, agree on the definition of competence, which usually pertains to a minimum standard regarded acceptable to patients and for unsupervised practice, with the expectation that further development would be required to achieve high quality performance standards, subject to performance monitoring during the newly independent period [1] [17] [23].
Several additional limitations should be discussed. First, DOPS assessments were performed by trainers rather than independent (neutral) assessors, which could introduce bias. Next, there were limitations related to the DOPS instrument; for example, papilla status had not been recorded, and has since been introduced. There were also a considerable number of items rated as not assessable. This was not due to missing data, but due to the varied nature of ERCP where the performance of certain therapeutic interventions was not required and so cannot be assessed. This impacted on Cronbach’s alpha estimates, which considers not assessable responses as invalid, hence the rationale for sensitivity analyses. Third, learning curves were plotted against lifetime procedure counts, which relied on self-entered data on the JETS e-portfolio and may risk entry bias. JETS e-portfolio procedure counts have previously been validated against those within endoscopy reporting system databases [24], and are subject to local verification by training leads. Most UK trainees will have engaged with JETS prior to ERCP training (e. g. for gastroscopy and/or colonoscopy certification), and as such, the lifetime procedure counts derived for ERCP are likely to be more accurate than for other procedures. Fourth, analyses were performed at the level of each DOPS, rather than the trainee-level cumulative summation approach due to variable procedure counts in between DOPS. Finally, our analyses were centered on DOPS assessments and not on objectively verified procedural level data and key performance indicators. This has limitations compared with TEESAT, which measures specific indications, completion end points, and complication rates. Although these aspects are recorded in the self-reported JETS e-portfolio entries for ERCP procedures (Fig. 2s), these are not yet automatically linked to DOPS. The UK National Endoscopy Database achieved roll-out in 2018 and is envisaged to support ERCP functionality in the future [25]. This would upload ERCP training procedures in real time to facilitate cross-linkage of procedural data with DOPS and to address the potential issue of entry bias.
Globally, no national credentialing/certification process exists for ERCP [26]. Instead, the privilege for independent practice remains at the discretion of local trainers and employers. With apprenticeship models being increasing phased out in favor of competency-based curricula, there are increasing calls by national societies to endorse a move towards certification [22] [26], in line with other modalities such as gastroscopy and colonoscopy [27]. Our results may inform efforts by national training bodies, such as JAG, to support the development of competency frameworks to quality assure training in ERCP.
Conclusion
In this study involving 109 ERCP trainees from 80 UK training centers, we have provided evidence of internal structure, consequential, and discriminative validity to support the ERCP DOPS as a novel competency assessment tool. The study has also shed light on competency acquisition in ERCP, with ≥ 300 procedures required for ≥ 85 % of the cohort to demonstrate competence in biliary cannulation, suggesting the minimum procedure requirement of 300 ERCPs as an eligibility criterion for triggering high-stakes assessment for independent practice.
#
#
#
Competing interests
All authors are associated with the Joint Advisory Group on Gastrointestinal Endoscopy Quality Assurance of Training Working Group.
Acknowledgments
The authors would like to acknowledge James Hodson from the Institute of Translational Medicine, Birmingham, UK, for statistical assistance and Professor Sauid Ishaq, Russell’s Hall Hospital, Dudley, UK, for critically reviewing the manuscript.
-
References
- 1 Siau K, Hawkes ND, Dunckley P. Training in endoscopy. Curr Treat Options Gastroenterol 2018; 16: 345-361
- 2 Siau K, Green JT, Hawkes ND. et al. Impact of the Joint Advisory Group on Gastrointestinal Endoscopy (JAG) on endoscopy services in the UK and beyond. Frontline Gastroenterol 2019; 10: 93-106
- 3 Endoscopic retrograde cholangiopancreatography (ERCP). 2016 Available from: https://www.thejag.org.uk/Downloads/DOPS%20forms%20(international%20and%20reference%20use%20only)/Formative%20DOPS_ERCP.pdf [Accessed: 6 November 2018]
- 4 Siau K, Dunckley P, Valori R. et al. Changes in scoring of Direct Observation of Procedural Skills (DOPS) forms and the impact on competence assessment. Endoscopy 2018; 50: 770-778
- 5 Crossley J, Jolly B. Making sense of work-based assessment: ask the right questions, in the right way, about the right things, of the right people. Med Educ 2012; 46: 28-37
- 6 Messick S. Validity of psychological assessment: validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Am Psychol 1995; 50: 741
- 7 Cook DA, Hatala R. Validation of educational assessments: a primer for simulation and beyond. Adv Simul 2016; 1: 31
- 8 Rios J, Wells C. Validity evidence based on internal structure. Psicothema 2014; 26: 108-116
- 9 Hinkle DE, Wiersma W, Jurs SG. Applied statistics for the behavioral sciences. 5. Boston: Houghton Mifflin; 2003
- 10 Bland JM, Altman DG. Statistics notes: Cronbach’s alpha. BMJ 1997; 314: 572
- 11 Jørgensen M, Konge L, Subhi Y. Contrasting groups’ standard setting for consequences analysis in validity studies: reporting considerations. Adv Simul 2018; 3: 5
- 12 Wani S, Hall M, Wang AY. et al. Variation in learning curves and competence for ERCP among advanced endoscopy trainees by using cumulative sum analysis. Gastrointest Endosc 2016; 83: 711-719
- 13 Wani S, Keswani R, Hall M. et al. A prospective multicenter study evaluating learning curves and competence in endoscopic ultrasound and endoscopic retrograde cholangiopancreatography among advanced endoscopy trainees: the Rapid Assessment of Trainee Endoscopy Skills Study. Clin Gastroenterol Hepatol 2017; 15: 1758-1767
- 14 Ekkelenkamp VE, Koch AD, Haringsma J. et al. Quality evaluation through self-assessment: a novel method to gain insight into ERCP performance. Frontline Gastroenterol 2014; 5: 10-16
- 15 Ekkelenkamp VE, Koch AD, Rauws EA. et al. Competence development in ERCP: the learning curve of novice trainees. Endoscopy 2014; 46: 949-955
- 16 ERCP – the way forward, a standards framework. British Society of Gastroenterology ERCP Working Party; 2014 Available from: https://www.bsg.org.uk/asset/341DCD67-426A-44F4-910DD392C8A39606 [Accessed 28 February 2019]
- 17 Verma D, Gostout CJ, Petersen BT. et al. Establishing a true assessment of endoscopic competence in ERCP during training and beyond: a single-operator learning curve for deep biliary cannulation in patients with native papillary anatomy. Gastrointest Endosc 2007; 65: 394-400
- 18 Mandai K, Uno K, Fujii Y. et al. Number of endoscopic retrograde cholangiopancreatography procedures required for short biliary cannulation time. Gastroenterol Res Pract 2017; 2017: 1515260
- 19 Wani S, Keswani RN, Petersen B. et al. Training in EUS and ERCP: standardizing methods to assess competence. Gastrointest Endosc 2018; 87: 1371-1382
- 20 Jones DB, Hunter JG, Townsend CM. et al. SAGES rebuttal. Gastrointest Endosc 2017; 86: 751-754
- 21 Voiosu T, Bălănescu P, Voiosu A. et al. Measuring trainee competence in performing endoscopic retrograde cholangiopancreatography: a systematic review of the literature. 2019; 7: 239-249
- 22 Bekkali NL, Johnson GJ. Training in ERCP and EUS in the UK anno 2017. Frontline Gastroenterol 2017; 8: 124-128
- 23 Poppers DM, Cohen J. The path to quality colonoscopy continues after graduation. Gastrointest Endosc 2019; 89: 493-495
- 24 Ward ST, Mohammed MA, Walt R. et al. An analysis of the learning curve to achieve competency at colonoscopy using the JETS database. Gut 2014; 63: 1746
- 25 Lee TJW, Siau K, Esmaily S. et al. Development of a national automated endoscopy database: the United Kingdom National Endoscopy Database (NED). United Eur Gastroenterol J 2019;
- 26 Cotton PB, Feussner D, Dufault D. et al. A survey of credentialing for ERCP in the United States. Gastrointest Endosc 2017; 86: 866-869
- 27 Siau K, Anderson JT, Valori R. et al. Certification of UK gastrointestinal endoscopists and variations between trainee specialties: results from the JETS e-portfolio. Endosc Int Open 2019; 7: E551-E560
Corresponding author
-
References
- 1 Siau K, Hawkes ND, Dunckley P. Training in endoscopy. Curr Treat Options Gastroenterol 2018; 16: 345-361
- 2 Siau K, Green JT, Hawkes ND. et al. Impact of the Joint Advisory Group on Gastrointestinal Endoscopy (JAG) on endoscopy services in the UK and beyond. Frontline Gastroenterol 2019; 10: 93-106
- 3 Endoscopic retrograde cholangiopancreatography (ERCP). 2016 Available from: https://www.thejag.org.uk/Downloads/DOPS%20forms%20(international%20and%20reference%20use%20only)/Formative%20DOPS_ERCP.pdf [Accessed: 6 November 2018]
- 4 Siau K, Dunckley P, Valori R. et al. Changes in scoring of Direct Observation of Procedural Skills (DOPS) forms and the impact on competence assessment. Endoscopy 2018; 50: 770-778
- 5 Crossley J, Jolly B. Making sense of work-based assessment: ask the right questions, in the right way, about the right things, of the right people. Med Educ 2012; 46: 28-37
- 6 Messick S. Validity of psychological assessment: validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Am Psychol 1995; 50: 741
- 7 Cook DA, Hatala R. Validation of educational assessments: a primer for simulation and beyond. Adv Simul 2016; 1: 31
- 8 Rios J, Wells C. Validity evidence based on internal structure. Psicothema 2014; 26: 108-116
- 9 Hinkle DE, Wiersma W, Jurs SG. Applied statistics for the behavioral sciences. 5. Boston: Houghton Mifflin; 2003
- 10 Bland JM, Altman DG. Statistics notes: Cronbach’s alpha. BMJ 1997; 314: 572
- 11 Jørgensen M, Konge L, Subhi Y. Contrasting groups’ standard setting for consequences analysis in validity studies: reporting considerations. Adv Simul 2018; 3: 5
- 12 Wani S, Hall M, Wang AY. et al. Variation in learning curves and competence for ERCP among advanced endoscopy trainees by using cumulative sum analysis. Gastrointest Endosc 2016; 83: 711-719
- 13 Wani S, Keswani R, Hall M. et al. A prospective multicenter study evaluating learning curves and competence in endoscopic ultrasound and endoscopic retrograde cholangiopancreatography among advanced endoscopy trainees: the Rapid Assessment of Trainee Endoscopy Skills Study. Clin Gastroenterol Hepatol 2017; 15: 1758-1767
- 14 Ekkelenkamp VE, Koch AD, Haringsma J. et al. Quality evaluation through self-assessment: a novel method to gain insight into ERCP performance. Frontline Gastroenterol 2014; 5: 10-16
- 15 Ekkelenkamp VE, Koch AD, Rauws EA. et al. Competence development in ERCP: the learning curve of novice trainees. Endoscopy 2014; 46: 949-955
- 16 ERCP – the way forward, a standards framework. British Society of Gastroenterology ERCP Working Party; 2014 Available from: https://www.bsg.org.uk/asset/341DCD67-426A-44F4-910DD392C8A39606 [Accessed 28 February 2019]
- 17 Verma D, Gostout CJ, Petersen BT. et al. Establishing a true assessment of endoscopic competence in ERCP during training and beyond: a single-operator learning curve for deep biliary cannulation in patients with native papillary anatomy. Gastrointest Endosc 2007; 65: 394-400
- 18 Mandai K, Uno K, Fujii Y. et al. Number of endoscopic retrograde cholangiopancreatography procedures required for short biliary cannulation time. Gastroenterol Res Pract 2017; 2017: 1515260
- 19 Wani S, Keswani RN, Petersen B. et al. Training in EUS and ERCP: standardizing methods to assess competence. Gastrointest Endosc 2018; 87: 1371-1382
- 20 Jones DB, Hunter JG, Townsend CM. et al. SAGES rebuttal. Gastrointest Endosc 2017; 86: 751-754
- 21 Voiosu T, Bălănescu P, Voiosu A. et al. Measuring trainee competence in performing endoscopic retrograde cholangiopancreatography: a systematic review of the literature. 2019; 7: 239-249
- 22 Bekkali NL, Johnson GJ. Training in ERCP and EUS in the UK anno 2017. Frontline Gastroenterol 2017; 8: 124-128
- 23 Poppers DM, Cohen J. The path to quality colonoscopy continues after graduation. Gastrointest Endosc 2019; 89: 493-495
- 24 Ward ST, Mohammed MA, Walt R. et al. An analysis of the learning curve to achieve competency at colonoscopy using the JETS database. Gut 2014; 63: 1746
- 25 Lee TJW, Siau K, Esmaily S. et al. Development of a national automated endoscopy database: the United Kingdom National Endoscopy Database (NED). United Eur Gastroenterol J 2019;
- 26 Cotton PB, Feussner D, Dufault D. et al. A survey of credentialing for ERCP in the United States. Gastrointest Endosc 2017; 86: 866-869
- 27 Siau K, Anderson JT, Valori R. et al. Certification of UK gastrointestinal endoscopists and variations between trainee specialties: results from the JETS e-portfolio. Endosc Int Open 2019; 7: E551-E560



