Monday, March 1, 2021

LOINC-SDTM mapping for Drug and Toxicology Lab Test

This week I started working on a mapping between LOINC codes for Drug and Toxicology lab tests (LOINC class "DRUG/TOX") and the CDISC SDTM LB domain and controlled terminology (CT) for it.
This work is not only important for sponsors and CROs who obtain lab results accompanied by the LOINC code (which should be the routine nowadays), and need to generate SDTM datasets, but also for being able to use "Real World Data" (RWD) data e.g. from Electronic Health Records (EHRs). It is also of utmost important for being able to (semi-)automatically generate CDISC Biomedical Concepts (BCs) from LOINC panel codes (groups of LOINC codes for tests that logically belong together), a topic on which I will speak (and perform a demo) at the European CDISC Interchange 2021 in April .

The task is however, at first look, enormous: this class contains 8314 LOINC codes (LOINC v.2.69) with 2605 distinct values for the analyte (LOINC "Component").The published CDISC-LB mapping only contains mappings for 852 DRUG/TOX LOINC codes, so, there are still 1800 "to go". Some of the work can however be automated, but it still remains a lot of work...

I first retrieved all the DRUG/TOX LOINC codes with its attributes from my local install of the LOINC database, and generated 2 worksheets (yes, I sometimes do use Excel), one with all the codes that have more than one target CDISC specimen type (LBSPEC), like for LOINC System= "Ser/Plas" ("Serum or Plasma"), as these require more than 1 mapping row in the final database. E.g. for "Ser/Plas", this will lead to 3 rows, one with LBSPEC="SERUM" (NCI code C13325), one with LBSPEC="PLASMA" (NCI code C13356) and one with LBSPEC="SERUM OR PLASMA" (NCI code C105706). The second worksheet then contains all the DRUG/TOX LOINC codes where a 1:1 mapping between the LOINC "System" and LBSPEC is expected.

Some of the work can be automated. For most of the LOINC "System" values, a mapping to LBSPEC already exists and can easily be reused. Some additional work may have to be done for the mapping between the LOINC "Method" and LBMETHOD. Also attention has to paid to fasting statuses and "challenges" and "post-dose" entries (if any). But most of the manual work is on mapping the analyte (LOINC "Component") to LBTESTCD/LBTEST, as this is essentially the meaning of the LBTESTCD/LBTEST pair: it represents the analyte, i.e. the compound that is measured.
What is represented by --TESTCD/--TEST pairs in SDTM differs between domains. For example, in Vital Signs (VS), VSTESTCD/VSTEST represents the property that is measured (e.g. a blood pressure). The property that is measured is not directly represented by a variable in LB. For example, if a concentration is measured, this can in LB only be seen from the actual values and units. In LOINC however, the "Property" is an essential part of the concept (one of the 5/6 "dimensions" of LOINC). In the by CDISC published LOINC-LB mapping this has been solved by adding some "Non-Standard Variables" (NSVs) which then go into the SUPPLB dataset.

Then I started the huge work ...

For generating the mapping between the LOINC "Component" (i.e. the analyte) and LBTESTCD and LBTEST, I used the CDISC Library Browser which was of great help because it also displays "similar" ways of writing a term as well as synonyms. It also allows me to immediately add the CDISC-NCI code of LBTESTCD/LBTEST to the mapping, which is of utmost importance for connecting to other coding systems used in healthcare (like SNOMED-CT), e.g. using the Unified Medical Language System UMLS and its API and RESTful web services.

Here is a picture of a few rows of the mapping:


 As I found out soon, the coverage of test codes for drug and toxicology lab testing in the CDISC-CT for LBTESTCD/LBTEST is very poor. After one day of mapping work, I estimates the coverage to be between 5 and 10%. This also means that for 100 drug/toxicology lab tests, we would need to to 90-95 "new term requests" to CDISC for a LBTESTCD/LBTEST. Considering the 1800 codes not covered yet by the original LOINC-LB mapping, this would mean something like 1600 to 1700 "new term requests". I guess the CDISC-CT team will "not be amused" ...

This urged me to rethink the problem.

Mapping is "bad" - personally I think it should be the last resort if nothing else works. 1:1 mapping can still be acceptable (but requires a large amount of work), but we are in deep trouble when such a 1:1 mapping is not possible.

Each unique LOINC "component" (i.e. the analyte) has a code itself: the "LOINC Part Code" (LP-codes). For example, the LP code for "Albumin" is LP6118-6. The LP code for Glucose is LP14635-4. The LP code for Doxycycline (one of the many not covered by CDISC-CT) is LP14992-9. This brought me to the idea "Why not use the 'LOINC Part Code' for LBTESTCD?".

Similarly, one could then use the "LOINC Part Name" for LBTEST. 

There are a few major objections against this, some of them having to do with the by the FDA mandated use of outdated SAS Transport 5 format for submissions.
The first is that LBTESTCD may not be longer than 8 characters. "LP14992-9" has 9. Also the "LOINC Part Name" sometimes has more than 40 characters. Even if we drop the "LP" from the code, we still have a problem. For example for "LP14992-9" this would reduce the code to "14992-9" but the SDTM rules (for sake of SAS Transport 5) state that "Values of --TESTCD must be limited to eight characters and cannot start with a number, nor may they contain characters other than letters, numbers, or underscores". So even the dash "-" is not allowed ... Dropping the dash and the check digit is in my opinion not a good idea, as it is an important measure against typing errors. Remark that the rules for -TESTCD/-TEST are based on making "transposal" possible in XPT datasets.

So, what we see once again, is that the SAS Transport 5 format is a "show stopper" for any "out of the box thinking".

The second thing I found out is that, with extremely few exceptions", every of the LOINC "Component" values, i.e. the analyte has a SNOMED-CT code. For example, the SNOMED-CT code for Doxycyclineis 372478003.

So, why not use the SNOMED-CT code for the analyte LBTESTCD with the SNOMED-CT name for LBTEST?

OK. Same problem: SNOMED codes are often longer than 8 characters, and do start with a number, so they cannot be used for LBTESTCD due to this (stupid?) SDTM rule that is only there to satisfy the outdated SAS Transport 5 format. Using "LOINC Parts" and "SNOMED-CT" for test codes would also have the advantage that it provides links to other codes and terms. After all, both are "hierarchical" and "network" coding systems. CDISC-CT just is consisting of ... lists.
For example, medicinal products containing Doxycycline are characterized by the SNOMED-CT code 10504007. And a "parent" code of it is "Substance with antimalarial mechanism of action" with SNOMED-CT code 373287002.

Here is a nice diagram taken from the "SNOMED-CT browser":

Can one do something similar with CDISC-CT? No way ...

So, why isn't CDISC using SNOMED-CT at all (except in the SDTM Trial Summary (TS) domain)?

An explanation is found on the CDISC website in the "knowledge base":

The first argument (SNOMED license) is not entirely correct. It should say "most governments". Even in Europe, where we are far behind the US in using SNOMED-CT, there is almost no country anymore that does not have a country-license. Even then, the "Knowledge base" applies double standards: MedDRA is not free at all for anyone, one needs to have a (rather expensive) license. So arguing that some (a minority) would have to pay to use SNOMED-CT and at the same time mentioning that MedDRA is mandated by regulatory agencies, for which one always has to pay, is in my opinion not correct at least.

Also the second argument, that SNOMED-CT does not have "definitions" is entirely incorrect. Every SNOMED-CT term does have a definition.
Furthermore, the "network" properties of SNOMED-CT are not mentioned at all. They should.

Please do remark that I do not plead for replacing all CDISC-CT by SNOMED-CT. There are many cases where this doesn't make sense. What we should however do is start discussing the use of LOINC codes, LOINC parts for tests and possibly for post-coordination of test parts (where also SNOMED-CT does a better job), LOINC answers for standardized results and start discussing the better use of SNOMED-CT within CDISC and especially within submission standards, and stop trying to keep LOINC and SNOMED-CT "out of the door". It is also in the advantage of pharma sponsors to use these terminologies, and I strongly think that especially sponsors who want to start using "real world data" should push CDISC harder to embrace LOINC and SNOMED-CT, providing webinars, trainings, implementation guides, etc..

CDISC is a founding member of the "Joint Initiative Council for Global Health Informatics Standardization" (JIC), together with LOINC and SNOMED, but this seems to be reflected in our work only marginally. That is really a pity.

And, we should not forget, clinical research is only less than 5% of healthcare, and that other 95% is using SNOMED-CT and LOINC all the way ...

Reactions are as always very welcome!
And if you also feel that CDISC should take LOINC, UCUM, and SNOMED-CT more seriously, don't tell me, tell CDISC (e.g. the CSO).

No comments:

Post a Comment