Automatic Cancer Coding Service

Content Extraction and Coding of Cancer Pathology Reports


Welcome to HLA-GLOBAL's Demonstration AI solution for detecting reportable cancer reports and coding them to the International Classification of Diseases (ICD-O-3).

This demonstration page is built so that you can see some examples of what you can achieve with HLA's processing. A sample of reports drawn across many tumour streams have been synthesised to ensure you obtain a cross-section of results, and to ensure there are no PHI elements1 – compliant to HIPAA standards. These reports are preserved in their original HL7 format to assist you in understanding the transformation made in producing the results.

This solution is available in two services which receive HL7 messages:

  • Service 1 is the Reportability service which determines if a pathology report describes a reportable cancer case or not.

  • Service 2 extends Service 1 to extract the 5 classic attributes of cancer, namely: Site, Histology, Grade, Behaviour and Laterality from reportable cancer cases.

In an operational installation, the results from both these tasks are provided in a standard format for reuse in any post processing system.

These services are built on Statistical Natural Language Processing (SNLP) methods that use the most advanced AI algorithms to achieve the highest attainable accuracies for these tasks.

Service 1 – Reportability

HLA's Laboratory experiments on real world examples and is able to identify reportable cancer pathology cases at 99% accuracy with only 0.6% false positives.

Service 2 – Coding

Accuracy of coding is above 96% for more than 80% of reports. The remaining reports are shunted into a Manual Processing Bay for your CTR to process. With relief from processing many mundane reports, the CTRs will now be able to concentrate on the more complex and interesting reports and have more time to better analyse them. Nevertheless, the Manual Processing reports are still coded to an accuracy of 80%+ so some headway is provided to the CTRs for creating a final coding for these difficult reports. HLA's in-house research continues to improve on these results.

SNLP Processing is a very compute-intensive process and pathology reports require up to 90 seconds of processing to get all the fine detail properly recognised. For this reason, there are two available processing options: "offline" and "real-time". "Offline" will display the results immediately because they have been pre-stored in the data file. "Real-time" will compute the results using HLA's SNLP engine and provide a selection of performance data to assist in understanding the processing tasks. Both processing methods highlight annotations of many of the clinical entities relevant to correct analysis of the report.

Due to the major change in NAACCR requirements in 2018, options to compute for the 2018 requirements or the pre-2018 requirements are included.

Service 3 - Synoptic Report

Synoptic reports sometimes called structured reports or templates are an attempt to shape the reporting of pathology content into systematic layout. This is expected to bring benefits of easier post processing and data extraction and overcome the problem of omitted information. These benefits have not been realised for a number of reasons, namely:

  1. Laboratories have been able to use their own naming for the data items on the reports.

  2. Pathologist have been free to change the report by adding new fields and or removing unused fields.

  3. Conversion of reports to HL7 for transmission has caused many reorganisations of formatting making it harder to automatically process the reports.

We have presented here our ongoing work at standardising Synoptic Reports. The demonstrations provide examples of Breast and Lung cancer reports. For the time being our analysis and dismemberment of Synoptic Reports will be on a Lab by Lab basis as we build up a method for systematically processing the variable formats used throughout the industry.

Service 4 - Grade Components

There are 20+ grading systems used in cancer pathology reporting and each has its own important components. The illustrations in the tab provide only a limited number of grades. The results show the Grade values of each component, their Ranges where provided and the Summary value and range.

Service 5 - Biomarkers & Genetics

The Biomarker analysis provides identification of the specimen and the results from analysing it. The collection of reports has been selected to show the variety of different formats and layouts Labs used to present their report contents.

Service 6 - Stage

The Stage reports have been selected to show the different formats that appear in reports. The standard staging T,N,M categories are presented along with the Stage group if present. In many reports the value for N and M are implicit as no sampling is made to asses their presence so default NX ad MX might be assigned. In these examples we have only used identifiable category values.

The results also display the staging classification for the timepoint at which the Stage is determined using the Classes:

  • c - Clinical class is staging after initial investigations but before treatment

  • p - Pathological class is the Clinical class modified by any operative findings and pathology of resection specimens

  • y - identified staging post neoadjuvant therapy and before planned surgery. Used in combination as yc or yp

  • r - staging associated with disease recurrence or progression. Also known as retreatment classification

  • a- used for cancers only identified at autopsy

No attempt is made to infer the classification values if they are not present. The system does not make any attempt to compile the stage grouping value from the report contents. It only displays any value provided in the report.

Additional staging descriptors of sn, f and m are reported where found explicitly in the staging information and not inferred even though it might be possible. G and R are presented where reported and there is no attempt to infer their values. Lymphovascular Invasion (LVI) is not reported in this section but in the Invasion tab.

Service 7 - Lymph Nodes

The description of Lymph Nodes comes in a variety of forms. Core content presents the number of nodes examined and the number positive. Other reports will give more detailed specification of the location of the nodes while other content gives histopathological details such as macro- and micro-metastasis. Sentinel Lymph Nodes can be reported separately.

Service 8 - TILS

Tumor Infiltrating Lymphocytes (TILS) are one of many tests that can be made of specimens for prognostic indicators. Overtime we shall add more indicators to this page.

Service 9 - Invasions

Invasion has a number of forms, namely: Lymphovascular, Lymphatic, Vascular, Perineurial, and Direct Extension.

Service 10 - Margins

Margins are variously recorded so that it is not easy to produce a systematic display of these results.

You need to register to use the demonstration service

The demonstration page is only available to organisations with a responsibility to code cancer pathology reports. Please register here.

Note for full installation in your organisation

SNLP methods rely on learning from a training set of examples. This base demonstration service has been trained on reports from a wide variety of pathology laboratories so that the training process for new customer installations is not extensive. However, your service would need to be tuned to the variety of pathology authors who deliver reports to your organisation. In a real-world setting, a sample of your files is necessary to complete the tuning. Once the system is customised for your sample documents you will be able to use the service. The service is encrypted to protect privacy.

Contact HLA on to discuss how this service can assist your needs.

1 De-indentified sample reports may not be completely internally consistent due to their somewhat artificial nature.