Background: Previous investigators have shown that novices are able to assess surgical skills as reliably as expert surgeons. The purpose of this study was to determine how novices and experts arrive at these graded scores when assessing laparoscopic skills and the potential implications this may have for surgical education.
Methods: Four novices and four general laparoscopic surgeons evaluated 59 videos of a suturing task using a 5-point scale. Average novice and expert evaluator scores for each video and the average number of times that scores were changed were compared. Intraclass correlation coefficients were used to determine inter-rater and test-retest reliability. Evaluators were asked to define the number of videos they needed to watch before they could confidently grade and to describe how they were able to distinguish between different levels of expertise.
Results: There were no significant differences in mean scores assigned by the two evaluator groups. Novices changed their scores more frequently compared to experts, but this did not reach statistical significance. There was excellent inter-rater reliability between the two groups (ICC = 0.91, CI 0.85-0.95) and good test-retest reliability (ICC > 0.83). On average, novices and experts reported that they needed to watch 13.8 ± 2.4 and 8.5 ± 2.5 videos, respectively, before they could confidently grade. Both groups also identified similar qualitative indicators (e.g., instrument control).
Conclusion: Evaluators with varying levels of expertise can reliably grade performance of an intracorporeal suturing task. While novices were less confident in their grading, both groups were able to assign comparable scores and identify similar elements of a suturing skill as being important in terms of assessment.
Keywords: Laparoscopic; Novice evaluators; Suturing skill; Video assessment.