[Automatic coding of pathologic cancer variables by the search of strings of text in the pathology reports. The experience of the Tuscany Cancer Registry]

Epidemiol Prev. 2005 Jan-Feb;29(1):57-60.
[Article in Italian]

Abstract

The present study evaluates the application of an automatic system for variables coding by means of strings reading in the text of the pathology reports, in the database of the Tuscany Cancer Registry. Incidence data for the years 2000 (n. 6297) and 2001 (n. 6291) for subjects for whom computerised pathology reports were available were included. The system is based on Queries (SQL language) linked to Functions (Visual Basic for Applications) that work on Windows Access. The agreement between original data inputted by the registrars and variables coded by means of automatic reading has been evaluated by means of Cohen's kappa. The following variables were analysed: cancer site (kappa = 0.87 between "manual" and automatic coding, for cases incident in the year 2001), morphology (kappa=0.75), Berg's morphology groups (kappa=0.87), behaviour (kappa=0.70), grading (kappa=0.90), Gleason (kappa=0.90), focality (kappa=0.86), lateralily (kappa=0.36), pT (kappa=0.92), pN (kappa=0.76), pM (kappa=0.28), number of lymph nodes (kappa=0.69), number of positive lymph nodes (kappa=0.70), Breslow thickness (kappa=0.94), Clark level (kappa=0.91), Dukes (kappa=0.74). The system of automatic reading of strings allows to collect a very huge amount of reliable information and its use should be implemented by the Registries.

MeSH terms

  • Catchment Area, Health
  • Electronic Data Processing*
  • Humans
  • Italy / epidemiology
  • Neoplasms / epidemiology*
  • Registries*