Saturday, February 29, 2020

The LOINC – CDISC-LB mapping and the SDTM-ETL software


The LOINC to CDISC-LB mapping has recently been published for "public review" by the CDISC Controlled Terminology team. It is meant to help sponsors generating values for the post-coordinated LB (lab) variables such as LBTESTCD, LBTEST, LBSPEC, LBMETHOD, etc., when the precoordinated LOINC code is provided by the laboratory. Sponsors should however not try to generate theLOINC code themselves.

Essentially, when the LOINC code is already provided by LBLOINC, it should not be necessary to generate values for at least LBTESTCD, and LBTEST, as this leads to redundancy, which is never good, as redundancy easily leads to errors. However, SDTM is already full of redundancies anyway, the most obvious the DY variables, but it seems that the tools used by the FDA reviewers are not able to even make the simplest derivations (such as a DY value) "on the fly".

So, how will sponsors use the LOINC to CDISC mapping? The mapping currently comes as an Excel file, and it is expected that also the final version will be published as an Excel file. Now, Excel files are barely usable for automation (better is JSON or XML), and as far as I know, there currently are no plans to add the LOINC to CDISC mapping to the CDISC Library, and to make it available for software programs through the CDISC Library RESTful API.

Therefore, we decided to put some effort in making the mapping machine-readable, and to implement it in our popular SDTM-ETL software for generating SDTM and SEND submissions. The newest version (v.4.0, a major release) will probably become available in summer, and will  (among others) have full support for SDTM-IG 3.3 and for Define-XML 2.1.

In SDTM-ETL v.4.0, a new function "loinc2cdisc(" will be introduced, taking two arguments:

  • The first argument is a LOINC code, or a variable holding a LOINC code
  •  The second argument is the SDTM variable name

So for example, LOINC code 13532-7 (Xanthochromia [Presence] of Cerebral spinal fluid Qualitative) will deliver the following values for the following function calls

  • loinc2cdisc('13532-7', 'LBTESTCD')            => XNTHCHR
  • loinc2cdisc('13532-7', 'LBTEST'')                 => Xanthochromia
  • loinc2cdisc('13532-7', 'LBSPEC')                  => CEREBROSPINAL FLUID
         and so on. 
The user can then decide whether he/she uses the value from the mapping, or uses other information coming from other sources (such as the EDC system, the protocol). For example, for LOINC code 1751-7 (Albumin [Mass/volume] in Serum or Plasma, will deliver "SERUM OR PLASMA" for the function call loinc2cdisc('1751-7', 'LBSPEC'),  but it may be that the user has the information that the specimen was "PLASMA" and thus uses the latter or overwrites the value from the mapping.

Of course, the value of the LOINC code will usually not be hard-coded in the SDTM-ETL mapping scripts, but be obtained from somewhere else and stored into a local variable. For example:

$LOINC = path-to-source-
$LB.LBTESTCD = loinc2cdisc($LOINC, 'LBTESTCD');


In the SDTM-ETL software, the user will have the choice between two technical implementations for retrieving mapping information.

The first implementation is using our recently developed RESTful web service, which has been tested thoroughly by a number of volunteer customers, but for which the details and API have not been made public yet, as it still uses a draft of the mapping. As soon as the LOINC to CDISC mapping is published by CDISC as "final", we will update our RESTful web service, and make details and API publicly available. We expect that this can be done within 1-2 days after the final version has been published by CDISC.
Readers interested in using this RESTful web service already now (although it is using a draft version of the mapping) can contact me, and I will send all necessary details. But please do not use for production yet!
We will also offer sponsor companies, CROs and other providers to install the RESTful Web Service on their own server systems, so that no API calls must be made outside the company for being able to use the RESTful web service.


And, as soon as CDISC has the complete mapping available through the "CDISC Library", we might decide to discontinue our own RESTful web service, and encourage the users to move to the one from the "CDISC Library".

The second technical implementation is by reading a CSV file of the mappings by the SDTM-ETL software. The file is parsed and made available to the "loinc2cdisc(" function.
This implementation has the advantage that the customer can add further mappings to the mapping file easily. After all, the by CDISC published mapping is "only" for about 1400 (but most popular) LOINC codes, whereas the LOINC database consists of over 92,000 codes, of which 80-90% are laboratory codes.
For example, LOINC code 1750-9 (Albumin [Mass/Volume] in Semen) is not present in the LOINC to CDISC mapping, but it could easily be added, as only one component is different from LOINC code 1751-7 (Albumin [Mass/volume] in Serum or Plasma), and "SEMEN" is in the CDISC controlled terminology for LBSPEC.
If I find the time, I would like to do such an exercise for all codes in the LOINC to CDISC mapping: I expect that we can at least double the number of LOINC codes for which there is a mapping available.


Some of you may already have asked themselves why I name this the "LOINC to CDISC" mapping, and not e.g. the "LOINC to LB" mapping as CDISC calls it. The reason is that LOINC goes far beyond lab test codes only, and that there is also a large number of LOINC test codes for e.g. vital signs (SDTM "VS" domain). As electronic health record systems do not use CDISC controlled terminology, but use LOINC codes for uniquely identifying the (vital signs) tests, it is obvious that the current mapping can, and in my opinion, should be extended for "vital signs". This would even be easier than for lab tests, as there are far less vital signs tests than laboratory tests. And, again in my opinion, the primary use of such mappings will soon be to extract information from electronic health records and "on the fly" generation of SDTM datasets from them. I wrote already something about this, and will demonstrate it  at the CDISCInterchange early April in Berlin, in my presentation "What's Up with LOINC and UCUM? From EHR Records to LBDataset in Just a Few Minutes".

The same applies to a lesser extend to the "Questionnaires" domain, as LOINC is developing test codes for standardized questionnaires at a high pace in cooperation with HL7. These test codes have a hierarchy, and have developed in such a way that individual questions can appear and reused in different questionnaires. CDISC terminology is however still mostly just a "list", with little or no relationships between terms. But that is another discussion

We expect having the release of SDTM-ETL v.4.0 with the discussed new features in summer.