Endoscopy 2006; 38(3): 218-225
DOI: 10.1055/s-2005-870445
Original Article
© Georg Thieme Verlag KG Stuttgart · New York

Development of a Video Assessment Scoring Method to Determine the Accuracy of Endoscopist Performance at Screening Flexible Sigmoidoscopy

S.  Thomas-Gibson1 , P.  A.  Rogers2 , N.  Suzuki1 , M.  E.  Vance1 , M.  D.  Rutter1 , D.  Swain1 , A.  J.  Nicholls1 , B.  P.  Saunders1 , W.  Atkin2
  • 1Wolfson Unit for Endoscopy, St Mark’s Hospital, Harrow, United Kingdom
  • 2Cancer Research UK Colorectal Cancer Unit, St Mark’s Hospital, Harrow, United Kingdom
Further Information

W. Atkin, MPH, PhD

Cancer Research UK Colorectal Cancer Unit · St Mark’s Hospital

Watford Road · Harrow, HA1 3UJ · United Kingdom

Fax: +44-208-235-4277

Email: wendy.atkin@cancer.org.uk

Publication History

Submitted 10 January 2005

Accepted after revision 15 June 2005

Publication Date:
10 March 2006 (online)

Table of Contents

Background and Aims: Variation in the adenoma detection rate (ADR) at flexible sigmoidoscopy screening has been shown to be due to variation in endoscopist performance. There are no objective methods for scoring an endoscopist’s performance reliably, and the aim of this study was to develop a valid and reliable objective scoring method using video footage of screening flexible sigmoidoscopies.
Methods: In a series of five experiments, experienced endoscopists (the scorers) independently scored a sample (n = 43) of the 40 000 flexible sigmoidoscopy extubations recorded as part of the United Kingdom Flexible Sigmoidoscopy Screening Trial (UK FSST). The scoring system, the parameters scored, and their definitions evolved over the course of the five experiments. The initial visual analogue score (range 0 - 100) used in the first two experiments evolved into a five-point score that ranged from 1 (E, poor) to 5 (A, excellent) in the last three experiments. The final parameters scored were: time spent viewing the mucosa, re-examination of poorly viewed areas, suctioning of fluid pools, distension of the lumen, lower rectal examination, and overall quality of the examination. The first four experiments scored one individual case per endoscopist; in experiment 5, an overall score was awarded for five cases performed by each endoscopist being assessed.
Results: Scoring five cases examined by an individual endoscopist using the A - E grading system was the most reliable method (interclass correlation coefficient 0.89). Cluster analysis demonstrated that the endoscopists in the high-scoring ADR group (ADR 14.7 - 15.9 %) could be differentiated from those in the intermediate- and low-scoring ADR groups (ADR 8.6 - 12.6 %).
Conclusions: An objective scoring system for assessing the accuracy of performance at screening flexible sigmoidoscopy, based on video footage, is described. Endoscopists who might benefit from further training can be identified using this method.

#

Introduction

Endoscopic examination of the colon using flexible sigmoidoscopy is currently used for colorectal cancer (CRC) screening in many thousands of individuals. The United Kingdom Flexible Sigmoidoscopy Screening Trial (UK FSST), a multicentre randomized controlled trial, has been examining the hypothesis that a single flexible sigmoidoscopy screening examination around the age of 60 is a cost-effective and acceptable means of reducing the incidence and mortality of CRC and baseline results have been published [1] [2]. A significant finding was that the wide variation in the adenoma detection rate (ADR) between endoscopists (who were using the same protocol and equipment) when screening people aged between the ages of 55 and 64 was due to variations in endoscopist performance rather than to variations in the demographic or geographical features of the screened population [3]. Similar findings have been reported elsewhere [4].

The identification and removal of adenomas reduces the incidence of CRC [5]. Larger and more advanced adenomas demonstrate a higher rate of progression to malignancy, but if flexible sigmoidoscopy is to be a once-only or infrequent screening method, as is proposed in the UK, all adenomas need to be identified. It is accepted that adenomas are the precursors of most colorectal cancers, so ADRs are a reliable and objective marker of the screening endoscopist’s performance [6].

The ADR could be viewed as the gold standard for assessing the accuracy of performance at flexible sigmoidoscopy screening. However, it is difficult to obtain comparable data: demographic variations and varying rates of previous endoscopy alter the background prevalence of adenoma. There is a need in the UK to define flexible sigmoidoscopy competence, to guide training and accreditation, and to monitor endoscopists’ early screening performance before their ADRs stabilise.

In the UK the assessment of medical endoscopists’ competence is often informal and unstructured and this may be partly responsible for the variable quality in endoscopic performance [7]. The Joint Advisory Group for Gastrointestinal Endoscopy (JAG), a professional group which represents physicians, surgeons, radiologists and general practitioners in the UK, has recently issued guidelines for performance assessment in endoscopy [8], but these do not address performance accuracy in terms of the pathology detected. There is a need for a reliable assessment tool for use in the training, accreditation and daily practice of flexible sigmoidoscopy screening.

Rex [9] demonstrated that higher quality withdrawal technique is associated with a lower adenoma miss rate and described an assessment method using video recordings of the withdrawal phase of colonoscopy. The UK FSST made video recordings of the withdrawal phase of flexible sigmoidoscopy examinations performed by 13 endoscopists, each of whom screened around 3000 individuals and their ADRs were found to range from 8.6 % to 15.9 % [3]. This videotape archive provided an opportunity to build on Rex’s work and develop an objective performance score for the extubation phase of screening flexible sigmoidoscopy examination.

#

Methods

A performance score for assessing accuracy in screening flexible sigmoidoscopy was developed by undertaking a series of five experiments, in which experienced endoscopists (the scorers) scored a small random sample (n = 43) of the 40 000 flexible sigmoidoscopy examination extubations that had been videotaped as part of the UK FSST [2]. Flexible sigmoidoscopy screening was performed in 13 UK endoscopy units. At each centre, a single experienced, medically trained endoscopist performed around 3000 procedures. All centres used standard bowel preparation methods and equipment. Endoscopists were encouraged to remove all small polyps found during screening; larger polyps were removed later at colonoscopy. Both the endoscopist and the person who was undergoing the screening examination knew that the examination was being videotaped and that the video would be used (on an anonymous basis) for research and training purposes.

Four experienced endoscopists designed the scoring system and score sheets. In addition to the parameters described by Rex (“examination of the proximal side of haustral folds”, the “removal of fluid and faeces” (suctioning), “luminal distension”, the “time spent viewing”, and an estimation of the “percentage of mucosa visualised”), the overall quality of the extubation technique was scored in experiments 2 - 5. Up to six scorers took part in each experiment. A ‘gold standard’ procedure performed by the senior colonoscopist (B.P.S.) was videotaped, shown, and discussed at the start of the study.

The parameters that were scored and the scoring system used evolved over time, the definitions changing following discussion of the results by scorers at each stage. The initial two experiments (phase 1) graded performance on a scale of 0 - 100; experiment 1 used a visual analogue scale (VAS), and experiment 2 used the score line divided into quintiles as a visual aid. The three later experiments (phase 2) used a five-point grading system, ranging from A (excellent, scoring 5) to E (poor, scoring 1). The implications were that an endoscopist assigned a grade A or B would be competent to continue screening without further intervention; an endoscopist graded D or E would require training intervention; and an endoscopist with a C grading would need to be monitored. The parameters scored in each experiment are summarised in Table [1], and Table [2] shows the final version of the score sheet.

Table 1 The parameters scored in each experiment (“+”, scored; “-”, not on score sheet)
Experiment Number
1 2 3 4 5
Scoring method VAS VAS divided into quintiles Grades A - E Grades A - E Grades A - E
Global parameters:
Overall quality of examination
Percentage “confidence no significant lesion missed” (i. e. all adenomas and all polyps > 2 mm)
Percentage of bowel mucosa not adequately visualised
-
-

-
+ Scored first
-

-
+ Scored last
+ Scored last

+
+ Scored first
+ Scored first

-
Parameters combined,
scored first
-
Individual parameters:
Time spent viewing the mucosa
Examination of proximal sides of folds
Re-examination of poorly viewed areas
Suctioning of fluid or faecal pools
Distension of the lumen for visualisation
Lower rectal examination
Was retroflexion performed? (Yes/No answer, not analysed)
+
+
+
+
+
+
-
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+*
+*
+*
+*
+*
+*
-
+*
+* Parameters combined
+*
+*
+*
-
VAS, visual analogue scale.
* Individual parameters scored only if overall quality was grade C, D or E.
Table 2 Final version of the performance assessment score sheet
Excellent Good Watch carefully Not good enough Unacceptable
Overall quality of examination (i. e. no significant lesion missed) A B C D E
Scoring of individual examination parameters (score only when “overall quality” scores are C, D or E):
Time spent viewing the mucosa A B C D E Re-examination of poorly viewed areas (e. g. proximal sides of folds or following slippage) A B C D E Suctioning of fluid or faecal pools A B C D E Distension of the lumen for visualisation A B C D E Lower rectal examination A B C D E

The videos were standardised by only selecting recordings of: a) examinations following excellent bowel preparation (to avoid having to adjust the results for suboptimal bowel preparation); b) examinations in men (flexible sigmoidoscopy is known to be more difficult in women [10] [11] [12]); c) examinations in which no polyps were found (to avoid scoring bias favouring procedures in which a polyp was detected); and d) examinations performed after the endoscopist had completed at least 1000 previous procedures (to allow their ADR to stabilise [3]).

Endoscopists were divided into three performance groups, based on their overall ADR adjusted for the age, gender, family history of CRC, and smoking history of screened subjects: high-performance adenoma detectors (ADR 14.7 - 15.9 %), intermediate-performance adenoma detectors (ADR 10.9 - 12.6 %) and low-performance adenoma detectors (ADR 8.6 - 9.8 %) [3]. Extubations were selected at random from endoscopists in each performance group by a researcher who was independent of the endoscopy unit (W.A.). The selected endoscopists were not the same in every experiment.

In each experiment, independent research assistants anonymised the videos, assigned a study number, and transferred the extubation phase of the procedures onto a tape in a random order. Scorers were blind to the identity of the examining endoscopists. They viewed and scored the procedures independently, without discussion. After each experiment, scorers discussed the reasons for any disparity in opinion. Changes were made to the scoring methods as the group progressed through the series of experiments. The aims, methodology, and results of each experiment are described in more detail in the Results section below.

#

Statistical Analysis

The VAS is a continuous measure. The five-point score was analysed as an ordered categorical variable and as a continuous measure. For continuous data, interobserver reliability was estimated with the intraclass correlation coefficient (ICC), which ranges from 0 to 1 [13].

Experiments 1 - 4 were designed as a two-way analysis of variance (ANOVA), from which the components of variance for variation between extubations (σ2 s), systematic variation between scorers (σ2 m) and additional random variation (σ2) were estimated. A negative estimate of any components of variance was estimated as 0. The ICC (ρI) was calculated as:

ρI = σ2/(σ2 + σ2 s + σ2 m)

In experiment 5, five procedures conducted by each of three different endoscopists were scored, and the components of variance were estimated from an extended two-way ANOVA with cases nested within endoscopists.

Pearson correlation coefficients were used to test for association between the different aspects of performance scores and between pairs of scorers, with Bonferroni adjustment for multiple testing.

Cluster analysis, using “average” linkage, was used to determine a natural grouping of the overall performance scores in order to assess whether it was possible to discriminate between endoscopists with different performance levels. These clusters were then compared with the ADR performance groups. Dendrograms were produced and the grouping undertaken by eye.

The data were analysed using Stata 8.2 (StatCorp 2003 Stata Statistical Software release 8.0; Stata Corporation, College Station, Texas, USA).

#

Results

#

Phase 1 - Experiments 1 and 2

Aims: To assess which parameters of Rex’s scoring system could be used to score flexible sigmoidoscopy extubation technique and whether a VAS provides high interobserver agreement and is able to discriminate between endoscopists with different performance at ademoma detection.

Methods: In each of the two experiments, scorers rated six performance parameters for five extubations (one extubation from each of five endoscopists, one with a high ADR, two with intermediate ADRs, and two with low ADRs), a total of ten extubations. In experiment 2, we added a seventh, global assessment parameter, “overall quality”.

Results: The only parameter which showed a high level of agreement between scorers in either experiment was distension of the lumen (ICC 0.83 for experiment 1, 0.65 for experiment 2) (Table [3]). The level of agreement for overall quality in experiment 2 was poor (ICC 0.10).

In experiment 1, the endoscopist with a high ADR was scored significantly higher than endoscopists from the other performance groups (P < 0.001). In experiment 2, however, the endoscopist with the high ADR and one of the endoscopists with a low ADR both had significantly higher mean scores than the other three endoscopists (31 and 32 vs. 23, 25 and 24, P < 0.001).

Conclusions: The level of agreement between scorers using the VAS was unsatisfactory for all parameters other than distension of the lumen. Discrimination between endoscopists with different ADR performance levels based on scoring a single extubation was not good. The VAS resulted in wide variability in scores.

#

Phase 2 - Experiment 3

Aims: To assess whether interobserver agreement for the parameters assessed in phase 1 could be improved by using a five-point ordered categorical score and whether two additional global parameters, “percentage of bowel mucosa not visualised” and “confidence no lesion missed”, improved the reliability of the score for the overall accuracy of the examination. Another aim was to determine whether any of the parameters could discriminate between high-, intermediate- and low-performance adenoma detectors and whether any parameters were redundant.

Methods: Five scorers scored a single extubation from each of nine endoscopists, three from each of the different ADR performance level groups. Nine parameters were scored on the five-point scale (Table [1]). “Overall quality” was scored last and an additional parameter, “retroflexion performed?” was included. The suctioning, re-examination, and examination of proximal sides of folds parameters were scored only if the scorer felt this was applicable.

Results: Some scorers scored more harshly than others. Except for overall quality, the mean scores for all of the performance scores varied significantly between scorers (all P < 0.05).

There was total agreement between scorers for retroflexion and a high level of agreement for lower rectal examination (ICC 0.67, see Table [3]). The latter was clearly related to whether retroflexion was undertaken (P < 0.0001). Table [3] also shows that there was moderate agreement between scorers for several other parameters (ICC 0.4 - 0.6). Cluster analysis using the global “overall quality” parameter identified two clusters for the high performance group, with one endoscopist performing significantly better than the other two, and a third cluster for the rest (Table [4]). The only other parameter for which clustering discriminated between any of the performance groups was “confidence no lesion missed”, which again distinguished extubations from the high performance group from the rest (data not shown). The mean scores for all parameters reflected the clustering shown in Table [4].

Table 3 Interscorer agreement (interclinician reliability) in the five experiments, estimated using the intraclass correlation coefficients for performance parameters
Experiment Number
1 2 3 4 5
Scoring method VAS 0 - 100 VAS divided into quintiles Grades A - E Grades A - E Grades A - E
Global parameters:
Overall quality of examination based on individual extubation scores
Overall quality of examination based on scoring performance for five sequential extubations
Percentage confidence no significant lesion missed
Percentage of bowel mucosa not adequately visualised
0.10 0.45



0.43
0.32
0.72



0.73
0.13

0.89
Individual parameters:
Time spent viewing the mucosa
Examination of proximal sides of folds
Re-examination of poorly viewed areas
Suctioning of fluid or faecal pools
Distension of the lumen for visualisation
Lower rectal examination
0.25
0.34
0.48
0.23
0.83
0.17
0.11
0.00
0.00
0.36
0.65
0.17
0.22
0.37
0.54
0.27
0.53
0.67
Table 4 Comparison of the cluster groups based on “overall quality of examination”, with adenoma detection rate (ADR) and performance group (high-, intermediate- or low-performance categories)
Experiment number Order of scoring extubation Prior information Cluster† Endoscopist’s mean “overall quality of examination” score over all scorers‡
Endoscopist’s
ADR*
Endoscopist’s
performance group
(based on ADR*)
3 1
7
8
6
3
4
2
10
5
15.9
14.7
14.7
12.6
11.3
10.9
9.8
9.6
8.6
High


Intermediate


Low

2
1
2
3
3
3
3
3
3
2.8
4.0
3.2
2.6
2.8
1.8
2.0
1.6
1.8
5 7
1
5
13
15
14
2
6
12
9
3
4
8
11
10
14.7




12.6




9.8
High




Intermediate




Low
1
1
1
1
1
2
1
2
2
2
2
2
2
1
2
3.50
4.00
3.67
3.33
4.00
2.17
3.50
1.83
2.83
2.50
2.67
2.83
2.50
3.67
2.50
5 All
All
All
14.7
12.6
9.8
High
Intermediate
Low
1
2
2
4.00
2.33
3.00
* Endoscopist’s adenoma detection rate based on all UK Flexible Sigmoidoscopy Screening Trial (UK FSST) study examinations.
† Cluster based on five-point quality of examination scores.
‡ The mean scores in experiments 3 and 4 are based on scores for individual extubations; the mean scores in experiment 5 are based one overall score given for five extubations.

Correlations between the scores for the three global assessment variables all exceeded 0.9, indicating very high association. The “percentage of bowel mucosa not adequately visualised” parameter had the lowest level of agreement between the scorers and was subsequently dropped. The examination of proximal sides of folds and re-examination parameters were also highly correlated (r = 0.89), as were retroflexion and lower rectal examination (r = 0.78). All these correlations were highly significant (P < 0.0001).

Conclusions: In comparison with the VAS, the five-point scale considerably improved the agreement for overall quality and lower rectal examination parameters. The overall quality parameter identified endoscopists with a high ADR, but could not discriminate between those with intermediate and low ADRs. Parameters that were so highly correlated that they could be considered to be measuring the same aspect of performance were combined (examination of proximal sides of folds with re-examination; and retroflexion with lower rectal examination). By definition, overall quality of performance equated to the “confidence no lesion missed” parameter. It was decided that scoring these “outcome” parameters was sufficient, and that the “explanatory” parameters should only be completed when the overall quality score was C, D or E, for the purpose of providing specific feedback on technique, in order to guide further training.

#

Phase 2 - Experiment 4

Aims: To examine whether scoring the overall outcome parameters first improved interobserver agreement and to establish whether the overall performance measures were both informative.

Methods: The extubations in experiment 3 were re-examined (several weeks later), asking scorers to score the overall quality and confidence no lesion missed parameters first. Only the scores for these two parameters were analysed.

Results: The interobserver agreement was much higher when the two overall composite scores were recorded first. For overall quality, the ICC increased from 0.45 to 0.72; for confidence no lesion missed, the ICC increased from 0.43 to 0.73 (Table [3]). The two composite scores were equally informative, with equivalent ICCs and clusters.

Conclusions: Scoring the overall assessment parameters first significantly improved interobserver agreement. The scores for the two parameters were strongly correlated and were therefore combined.

#

Phase 2 - Experiment 5

Aims: To investigate how a single overall score awarded for five extubations carried out by any single endoscopist watched sequentially compared with the scoring of the same extubations watched individually in a random order. We aimed to assess whether watching the videos sequentially improved interobserver agreement and improved discrimination between endoscopists with different ADRs. We also estimated intraendocopist variance.

Methods: A total of 15 extubations, five from each of three endoscopists of differing performance levels, were placed on a tape, first in a random order and then in three blocks of five, so that the extubations for each endoscopist could be read sequentially. The six scorers first scored the randomly ordered extubations (15 scores) and then viewed each series of five extubations, giving an overall performance score for each endoscopist (three scores).

Results: Differences between scorers’ mean scores were apparent when scoring the 15 individual extubations but not when a single score was given for a series of five extubations (data not shown). The overall level of agreement between the scorers was low for the individual extubation scores, (ICC 0.13), but improved considerably when a single score was given for a series of five examinations (ICC 0.89) (Table [3]).

Each endoscopist’s performance varied considerably between extubations (intraendoscopist variance ranged from 0.06 to 0.44), although the high-performance endoscopist showed the lowest variance. Of the 30 possible scores per endoscopist (five extubations, six scorers), the high-performance endoscopist had 20 A or B scores, 10 C scores and no D or E scores, and none of this endoscopist’s extubations were scored as C by all six scorers.

When a single overall score was given, all six scorers gave the high-performance adenoma detector a B and the low-performance adenoma detector a C; the intermediate-performance adenoma detector received two Cs and four Ds. Both of the more poorly performing endoscopists had only one extubation out the five awarded no D or E scores.

Clustering based on the individual extubation scores placed all the extubations from the high-ADR endoscopist together with the one high-scoring extubation from each of the other two endoscopists. The clustering did not differentiate between the intermediate- and the low-performance endoscopists. Clustering based on the overall score for five extubations also distinguished the high-performance endoscopist from the other two endoscopists (Table [4]).

Conclusions: Interobserver agreement improved considerably when one score was given to five extubations viewed sequentially, compared with the interobserver agreement found with individual scoring of extubations, allowing for better discrimination between endoscopists with different ADRs. Intraendoscopist variation in performance was lower for the high-performance endoscopist than it was for the other two endoscopists. These results provide evidence of the reliability and validity of a video assessment score for screening flexible endoscopy procedures.

#

Discussion

Objective methods for assessing endoscopic competency have been sought for many years [14] [15] [16]. Historically, competence was assumed once a certain number of supervised procedures had been performed, and professional bodies still recommend the performance of varying numbers of supervised procedures. For flexible sigmoidoscopy, the recommended minimum number of procedures varies from six to 50 [8] [17] [18] [19] [20] [21]. It is recognised that a numbers-based approach is fallible, however, and alternative methods of assessment are required [6] [22]. Some structured assessment tools have been described [8] [23], but most of these have not been validated and are often labour-intensive. Procedure time and depth of insertion have been proposed as competence measures [24], but these do not measure how accurately the mucosa is examined. In the USA, there is no formal or uniform accreditation process and guidelines are often developed by individual hospitals. The American Society for Gastrointestinal Endoscopy states that flexible sigmoidoscopy competence requires “visualisation of the splenic flexure” and “retroflexion in rectum” [20], but evidence suggests that these measures are also fallible [25] [26]. Withdrawal of the instrument is often taught first and is considered to be mastered more easily [21], but there is little literature on withdrawal technique and evaluation [27] [28]. To our knowledge, Rex [9] was the first to relate withdrawal technique to colonoscopic performance. We have continued his work and applied it to flexible sigmoidoscopy.

Surgical skills have been assessed using video recording [29] [30] [31] but, to our knowledge, this method has not been used to test competence in any clinical field on a formal basis. Its attractions are several, and our study simply required video home system (VHS) recording of the endoscopic view of live procedures. Endoscopists’ performance was probably improved by their knowing that the procedure was being recorded (the “Hawthorne effect”). However, performance still varied within and between endoscopists, suggesting that this method is likely to reflect day-to-day practice.

Our scoring system demonstrated validity by differentiating between the high-ADR endoscopists and the lower-ADR endoscopists. The five-point scale demonstrated good interobserver reliability when based on one score awarded for five sequential extubations.

Initially in this study, we used a VAS, which lacked validity and reliability. Scorers disliked the VAS and found assessment of the the numerous parameters cumbersome and difficult. The A - E performance grading system is familiar and used generally within the UK education system. It improved agreement for individual “overall quality” extubation scores but good interobserver agreement was only achieved when a single A - E score was given for five sequential extubations.

Good extubation technique is a composite of many factors. Initially, it was thought that individual parameters should be scored separately. However, it proved to be more reliable to give an overall grade and utilise the individual aspects as “explanatory” parameters. In general, individual parameters did not correlate with the overall score. Flexible sigmoidoscopy is a dynamic procedure and a parameter may be performed well in one segment of the colon but less well in others. For training and feedback, it is essential to break the procedure down into its component parts and the scoring system can be used to highlight areas of inadequate technique. The goal of clinical performance assessment is not to rank individuals but to ensure that defined levels of mastery are achieved and maintained [32]. This was achieved in our study, in that competent endoscopists were differentiated from those who could benefit from further training.

In a clinical screening programme it is essential that the quality of service is as close as possible to the gold standard. Objective performance assessment for CRC screening has never been undertaken, but should be incorporated into national CRC screening programmes to maintain high standards and public confidence.

We recognise the limitations of this study. The range of scores was narrow and few examinations were given grades A or E. However, if the scorers had assessed a larger group of endoscopists with a wider range of experience, they might have given a broader range of scores and hence improved validation. Alternatively, scorers may have poor discriminatory powers. Early on in this study, the scorers’ range of opinions was wide. This improved as the study progressed and could have been due to the scorers’ experience as well as improvement in the scoring system. Some inter-rater variability in scores persisted in the later experiments, but there will always be a degree of subjective opinion in such assessments. Scorers should be trained and their results monitored to ensure consistency and reliability.

Competency in extubation should not be considered in isolation. Other aspects of the procedure, such as depth of insertion, patient satisfaction, pathology pick-up rates and complications must also be considered. This assessment method may also be applicable to colonoscopy, for which, together with additional parameters such as caecal intubation rate, it would provide an assessment of overall performance. Our method also raises further questions: What factors are responsible for differences in technique? Would feedback with video review improve an individual’s performance? Is this scoring system applicable to diagnostic procedures?

We have developed a scoring system based on assessing video recordings of only five examinations, which could provide an objective assessment of flexible sigmoidoscopy examination performance. We suggest that endoscopists need to undergo rigorous training and accreditation prior to qualifying as a screening endoscopist and probably also require ongoing competence assessments. In setting up screening programmes, factors that influence performance must be understood in order to provide an effective service.

#

Acknowledgements

The authors would like to thank Dr Jim C. Brooker, Dr Shafi Quraishy and Dr Christopher B. Williams for their assistance.

Competing interests: None

In Brief

Another interesting by-product of the large UK sigmoidoscopy trial dealing with the question of how the diagnostic accuracy of an individual endoscopist could be objectively assessed; all examinations during this trial are videotaped and a sample was taken for analysis. Various parameters relating to diagnostic scrutiny were developed and validated and showed correlation with the adenoma detection rate.

#

References

  • 1 Atkin W S, Cuzick J, Northover J M, Whynes D K. Prevention of colorectal cancer by once-only sigmoidoscopy.  Lancet. 1993;  341 736-740
  • 2 UK Flexible Sigmoidoscopy Screening Trial Investigators . Single flexible sigmoidoscopy screening to prevent colorectal cancer: baseline findings of a UK multicentre randomised trial.  Lancet. 2002;  359 1291-1300
  • 3 Atkin W, Rogers P, Cardwell C. et al . Wide variation in adenoma detection rates at screening flexible sigmoidoscopy.  Gastroenterology. 2004;  126 1247-1256
  • 4 Bretthauer M, Skovlund E, Grotmol T. et al . Inter-endoscopist variation in polyp and neoplasia pick-up rates in flexible sigmoidoscopy screening for colorectal cancer.  Scand J Gastroenterol. 2003;  38 1268-1274
  • 5 Winawer S J, Zauber A G, Ho M N. et al . Prevention of colorectal cancer by colonoscopic polypectomy.  The National Polyp Study Workgroup N Engl J Med. 1993;  329 1977-1981
  • 6 Rex D K, Bond J H, Winawer S. et al . Quality in the technical performance of colonoscopy and the continuous quality improvement process for colonoscopy: recommendations of the U.S. Multi-Society Task Force on Colorectal Cancer.  Am J Gastroenterol. 2002;  97 1296-1308
  • 7 Bowles C J, Leicester R, Romaya C. et al . A prospective study of colonoscopy practice in the UK today: are we adequately prepared for national colorectal cancer screening tomorrow?.  Gut. 2004;  53 277-283
  • 8 Joint Advisory Group on Gastrointestinal Endoscopy .Guidelines for the training, appraisal and assessment of trainees in gastrointestinal endoscopy. 2004 [accessed 2005 Jul 5]. Available from: URL: http//www.thejag.org.uk/JAG_2004.pdf. 
  • 9 Rex D K. Colonoscopic withdrawal technique is associated with adenoma miss rates.  Gastrointest Endosc. 2000;  51 33-36
  • 10 Ramakrishnan K, Scheid D C. Predictors of incomplete flexible sigmoidoscopy.  J Am Board Fam Pract. 2003;  16 478-484
  • 11 Walter L C, de Garmo P, Covinsky K E. Association of older age and female sex with inadequate reach of screening flexible sigmoidoscopy.  Am J Med. 2004;  116 174-178
  • 12 Eloubeidi M A, Wallace M B, Desmond R, Farraye F A. Female gender and other factors predictive of a limited screening flexible sigmoidoscopy examination for colorectal cancer.  Am J Gastroenterol. 2003;  98 1634-1639
  • 13 Armitage P, Berry G, Matthews J NS. Statistical methods in medical research. 4th edn. Oxford; Blackwell Science 2002
  • 14 Hawes R, Lehman G A, Hast J. et al . Training resident physicians in fiberoptic sigmoidoscopy: how many supervised examinations are required to achieve competence?.  Am J Med. 1986;  80 465-470
  • 15 Cass O W. Training to competence in gastrointestinal endoscopy: a plea for continuous measuring of objective end points.  Endoscopy. 1999;  31 751-754
  • 16 Lal S K, Barrison A, Heeren T, Schroy P C III. A national survey of flexible sigmoidoscopy training in primary care graduate and postgraduate education programs.  Am J Gastroenterol. 2004;  99 830-836
  • 17 Weissman G S, Winawer S J, Baldwin M P. et al . Multicenter evaluation of training of non-endoscopists in 30-cm flexible sigmoidoscopy.  CA Cancer J Clin. 1987;  37 26-30
  • 18 Health and Public Policy Committee, American College of Physicians . Clinical competence in colonoscopy.  Ann Intern Med. 1987;  107 772-774
  • 19 Saad J A, Pirie P, Sprafka J M. Relationship between flexible sigmoidoscopy training during residency and subsequent sigmoidoscopy performance in practice.  Fam Med. 1994;  26 250-253
  • 20 ASGE . Principles of training in gastrointestinal endoscopy. From the ASGE. American Society for Gastrointestinal Endoscopy.  Gastrointest Endosc. 1999;  49 845-853
  • 21 Maule W F. Screening for colorectal cancer by nurse endoscopists.  N Engl J Med. 1994;  330 183-187
  • 22 Ashley O S, Nadel M, Ransohoff D F. Achieving quality in flexible sigmoidoscopy screening for colorectal cancer.  Am J Med. 2001;  111 643-653
  • 23 Proctor D D, Price J, Dunn K A. et al . Prospective evaluation of a teaching model to determine competency in performing flexible sigmoidoscopies.  Am J Gastroenterol. 1998;  93 1217-1221
  • 24 Holman J R, Marshall R C, Jordan B, Vogelman L. Technical competency in flexible sigmoidoscopy.  J Am Board Fam Pract. 2001;  14 424-429
  • 25 Painter J, Saunders D B, Bell G D. et al . Depth of insertion at flexible sigmoidoscopy: implications for colorectal cancer screening and instrument design.  Endoscopy. 1999;  31 227-231
  • 26 Cutler A F, Pop A. Fifteen years later: colonoscopic retroflexion revisited.  Am J Gastroenterol. 1999;  94 1537-1538
  • 27 Shinya H, Cwern M, Karlstadt R. Colonoscopy: technique and training methods.  Surg Clin North Am. 1982;  62 869-876
  • 28 Cotton P, Williams C B. Practical gastrointestinal endoscopy. 5th edn. Oxford; Blackwell Scientific 2003
  • 29 Stranc M F, McDiarmid J G, Stranc L C. Video assessment of surgical technique.  Br J Plast Surg. 1991;  44 65-68
  • 30 Scott D J, Rege R V, Bergen P C. et al . Measuring operative performance after laparoscopic skills training: edited videotape versus direct observation.  J Laparoendosc Adv Surg Tech A. 2000;  10 183-190
  • 31 Eubanks T R, Clements R H, Pohl D. et al . An objective scoring system for laparoscopic cholecystectomy.  J Am Coll Surg. 1999;  189 566-574
  • 32 Miller G E. The assessment of clinical skills/competence/performance.  Acad Med. 1990;  65 (9 Suppl) S63-S67

W. Atkin, MPH, PhD

Cancer Research UK Colorectal Cancer Unit · St Mark’s Hospital

Watford Road · Harrow, HA1 3UJ · United Kingdom

Fax: +44-208-235-4277

Email: wendy.atkin@cancer.org.uk

#

References

  • 1 Atkin W S, Cuzick J, Northover J M, Whynes D K. Prevention of colorectal cancer by once-only sigmoidoscopy.  Lancet. 1993;  341 736-740
  • 2 UK Flexible Sigmoidoscopy Screening Trial Investigators . Single flexible sigmoidoscopy screening to prevent colorectal cancer: baseline findings of a UK multicentre randomised trial.  Lancet. 2002;  359 1291-1300
  • 3 Atkin W, Rogers P, Cardwell C. et al . Wide variation in adenoma detection rates at screening flexible sigmoidoscopy.  Gastroenterology. 2004;  126 1247-1256
  • 4 Bretthauer M, Skovlund E, Grotmol T. et al . Inter-endoscopist variation in polyp and neoplasia pick-up rates in flexible sigmoidoscopy screening for colorectal cancer.  Scand J Gastroenterol. 2003;  38 1268-1274
  • 5 Winawer S J, Zauber A G, Ho M N. et al . Prevention of colorectal cancer by colonoscopic polypectomy.  The National Polyp Study Workgroup N Engl J Med. 1993;  329 1977-1981
  • 6 Rex D K, Bond J H, Winawer S. et al . Quality in the technical performance of colonoscopy and the continuous quality improvement process for colonoscopy: recommendations of the U.S. Multi-Society Task Force on Colorectal Cancer.  Am J Gastroenterol. 2002;  97 1296-1308
  • 7 Bowles C J, Leicester R, Romaya C. et al . A prospective study of colonoscopy practice in the UK today: are we adequately prepared for national colorectal cancer screening tomorrow?.  Gut. 2004;  53 277-283
  • 8 Joint Advisory Group on Gastrointestinal Endoscopy .Guidelines for the training, appraisal and assessment of trainees in gastrointestinal endoscopy. 2004 [accessed 2005 Jul 5]. Available from: URL: http//www.thejag.org.uk/JAG_2004.pdf. 
  • 9 Rex D K. Colonoscopic withdrawal technique is associated with adenoma miss rates.  Gastrointest Endosc. 2000;  51 33-36
  • 10 Ramakrishnan K, Scheid D C. Predictors of incomplete flexible sigmoidoscopy.  J Am Board Fam Pract. 2003;  16 478-484
  • 11 Walter L C, de Garmo P, Covinsky K E. Association of older age and female sex with inadequate reach of screening flexible sigmoidoscopy.  Am J Med. 2004;  116 174-178
  • 12 Eloubeidi M A, Wallace M B, Desmond R, Farraye F A. Female gender and other factors predictive of a limited screening flexible sigmoidoscopy examination for colorectal cancer.  Am J Gastroenterol. 2003;  98 1634-1639
  • 13 Armitage P, Berry G, Matthews J NS. Statistical methods in medical research. 4th edn. Oxford; Blackwell Science 2002
  • 14 Hawes R, Lehman G A, Hast J. et al . Training resident physicians in fiberoptic sigmoidoscopy: how many supervised examinations are required to achieve competence?.  Am J Med. 1986;  80 465-470
  • 15 Cass O W. Training to competence in gastrointestinal endoscopy: a plea for continuous measuring of objective end points.  Endoscopy. 1999;  31 751-754
  • 16 Lal S K, Barrison A, Heeren T, Schroy P C III. A national survey of flexible sigmoidoscopy training in primary care graduate and postgraduate education programs.  Am J Gastroenterol. 2004;  99 830-836
  • 17 Weissman G S, Winawer S J, Baldwin M P. et al . Multicenter evaluation of training of non-endoscopists in 30-cm flexible sigmoidoscopy.  CA Cancer J Clin. 1987;  37 26-30
  • 18 Health and Public Policy Committee, American College of Physicians . Clinical competence in colonoscopy.  Ann Intern Med. 1987;  107 772-774
  • 19 Saad J A, Pirie P, Sprafka J M. Relationship between flexible sigmoidoscopy training during residency and subsequent sigmoidoscopy performance in practice.  Fam Med. 1994;  26 250-253
  • 20 ASGE . Principles of training in gastrointestinal endoscopy. From the ASGE. American Society for Gastrointestinal Endoscopy.  Gastrointest Endosc. 1999;  49 845-853
  • 21 Maule W F. Screening for colorectal cancer by nurse endoscopists.  N Engl J Med. 1994;  330 183-187
  • 22 Ashley O S, Nadel M, Ransohoff D F. Achieving quality in flexible sigmoidoscopy screening for colorectal cancer.  Am J Med. 2001;  111 643-653
  • 23 Proctor D D, Price J, Dunn K A. et al . Prospective evaluation of a teaching model to determine competency in performing flexible sigmoidoscopies.  Am J Gastroenterol. 1998;  93 1217-1221
  • 24 Holman J R, Marshall R C, Jordan B, Vogelman L. Technical competency in flexible sigmoidoscopy.  J Am Board Fam Pract. 2001;  14 424-429
  • 25 Painter J, Saunders D B, Bell G D. et al . Depth of insertion at flexible sigmoidoscopy: implications for colorectal cancer screening and instrument design.  Endoscopy. 1999;  31 227-231
  • 26 Cutler A F, Pop A. Fifteen years later: colonoscopic retroflexion revisited.  Am J Gastroenterol. 1999;  94 1537-1538
  • 27 Shinya H, Cwern M, Karlstadt R. Colonoscopy: technique and training methods.  Surg Clin North Am. 1982;  62 869-876
  • 28 Cotton P, Williams C B. Practical gastrointestinal endoscopy. 5th edn. Oxford; Blackwell Scientific 2003
  • 29 Stranc M F, McDiarmid J G, Stranc L C. Video assessment of surgical technique.  Br J Plast Surg. 1991;  44 65-68
  • 30 Scott D J, Rege R V, Bergen P C. et al . Measuring operative performance after laparoscopic skills training: edited videotape versus direct observation.  J Laparoendosc Adv Surg Tech A. 2000;  10 183-190
  • 31 Eubanks T R, Clements R H, Pohl D. et al . An objective scoring system for laparoscopic cholecystectomy.  J Am Coll Surg. 1999;  189 566-574
  • 32 Miller G E. The assessment of clinical skills/competence/performance.  Acad Med. 1990;  65 (9 Suppl) S63-S67

W. Atkin, MPH, PhD

Cancer Research UK Colorectal Cancer Unit · St Mark’s Hospital

Watford Road · Harrow, HA1 3UJ · United Kingdom

Fax: +44-208-235-4277

Email: wendy.atkin@cancer.org.uk