Monday, February 11, 2019

SDTM --STRESN: why we need UCUM - Part 2


Continued from part 1: SDTM --STRESN: why we need UCUM - Part 1

Another interesting statement from the same paper is:
"The process of converting the lab test results to SDTM is not without its own set of challenges. For example, Sponsors or CROs must map what they obtain from the labs to CDISC controlled terminology, which is an error prone process."
One of the arguments against LOINC (and UCUM) is that researchers should learn yet-another coding system. Well, the same applies to CDISC-CT isn't it? But when all lab data come with the LOINC code (which is more and more the case) and with UCUM units (which is the case when the instrument with which the measurement is done also knows the LOINC code), then there would be no mapping needed at all, and the "standardization" to --STRESN could be automated completely.
This case is also described in the paper of Nataraj and Piper from SHIRE:

"Furthermore, the standardization of units, and the conversion of results to standard units, is often a significant challenge. As a starting point, it is important to map all units (both as collected in the original unit variable LBORRESU, and as standardized in the LBSTRESU variable) to CDISC terminology when possible. Currently the LB domain uses CDISC controlled terminology for units as represented in the UNIT codelist. There are instances when the units used in other systems such as electronic health records are based on the Unified Code for Units of Measurement (UCUM) standard. Currently there is no 1:1 relationship between the CDISC Controlled terms of units and UCUM. Until this is resolved, it will remain as a significant challenge for future data interoperability."


Essentially (my own interpretation!) SHIRE is asking here for being allowed to use UCUM notation in SDTM, as then all necessary error prone mapping would disappear.
Coming back to the original request from Clem McDonald: what is the use case for the FDA?


In order to demonstrate this, I took a real SDTM-LB and a VS dataset (from the FDA-CDISC pilot 2013 and replaced all units in ORRESU and STRESU by units in UCUM notation where necessary (as there is overlap). Here is a snapshot:



Where one also observes that some of the units are both valid in both CDISC as well as UCUM.
All OK? Try to look with the eyes of a reviewer at the FDA, who looks at thousands of such data points. Nothing special?


Ok, let us now, using the NLM RESTful web services for UCUM redo the now automated conversion from LBORRES to LBSTRESN, and if the outcome differs more than 0.1% issue a warning (by coloring the cell orange), and if the deviation is more than 0.5%, issue a conversion error (by coloring the cell red). This is not more than a few lines in the Java code of the viewer (as it uses the NLM web services), AND, it is generic, i.e. works for every UCUM unit (no reviewer programming required). Then we get:



Marking all the "basophils" measurement as containing a conversion factor: the "standardized" units being a factor 10 too high. As a consequence, the SDTM programmer all marked them as "HIGH", whereas they essentially would have been marked as "NORMAL" if the conversion was done correctly. Using CDISC units, these errors would probably remain unnoticed, as CDISC-CT does not provide any conversion at all from the [UNIT] codelist, and surely not in a machine-readable way. And if the SDTM programmer would have used the NLM RESTful web service for executing the  conversions, which is only possible for units in UCUM notation, these errors would not have occurred anyway.

Ok, maybe you did find the (somewhat obvious) errors, so let us try something a bit more difficult. We will have a look at the results of the VS dataset. Remark that the UCUM notation for "mmHg" is "mm[Hg]", for "inches" it is "[in_i], for "pounds" it is [lb_av]" and for "Fahrenheit" it is "[degF]. For none of these, the corresponding CDISC units from the [UNIT] codelist have conversion factors between them, especially not in a machine-readable way.
Here is a snapshot from the VS (vital signs) dataset:



When we however allow to do --ORRES to --STRESN conversion validation using the NLM RESTful web services, this is what we get:
 


Did you see the conversion errors yourself without grabbing your pocket calculator or without doing some programming yourself for each of the tests separately?


An even nicer demo, but more hypothetical, is for the blood pressure, where I supposed that is was reported in "pounds per square inch" (UCUM notation [psi]). I guess however no medical doctor is reporting blood pressure in pounds per square inch, but it is only for the demo. Here are the data:
 


Also remark the UCUM notation "{beats}/min", using so-called "annotations" to clearly separate property (what is measured) and the unit itself. The CDISC equivalent is "BEATS/MIN", mixing both up.


Now doing validation (not easily possible with CDISC units) using the NLM UCUM RESTful web services leads to:
 



Molar units
I have been asked several times whether the RESTful web services can also do "molar conversions", e.g. from "mol/L" to "g/L". At this moment, it is not possible yet, but as well the NLM as a project team of my student at the university is working on it. For such a conversion, the molecular weight of the component needs to be known. Also here, we cannot work with the CDISC-CT, as the [LBTESTCD] codelist (which essentially contains the chemical compound that is being tested, which is essentially not the test itself) does not contain any information about molecular weights. Through the LOINC code of the test, the molecular weight of the component can easily be retrieved (through the "LOINC group"). A simple join between 2 tables in the LOINC database suffices.
So we will soon, in cooperation with NLM, have a RESTful web service for such conversions. The query string will look something like:
[base]/ucumtransform/from/mmol/L/to/g/dL/LOINC/15076-3
Where "15076-3" is the LOINC code for "glucose [moles/volume] in urine).

Or the other way around:
[base]/ucumtransform/from/mg/dL/to/mmol/L/ LOINC /5792-7
Where "5792-7" is the LOINC code for "Glucose [Mass/volume] in Urine by Test strip".
Remark that in the "FDA pilot submission 2013", LBSPEC (permissible in SDTM) is missing, so that we can only guess what the specimen was. LBCAT=CHEMISTRY lets us guess that it was "Serum/Plasma", for which the LOINC code is then 2345-7:



Once the extended RESTful web service in place, this will (as requested by a number of pharma companies) also allow to "standardize" from molar to mass units and vice versa, as in the above example, but this time in an automated way, at least when UCUM notation is used for the units.
For the FDA it will then also be possible to check the conversions as done by the sponsor, in exactly the same way as described before, using the NLM RESTful web services.

What the combined use of LOINC and UCUM will also enable the FDA is to start comparing laboratory results between studies and sponsors. If one sponsor "standardizes" glucose lab results to "mmol/L" and the other to "g/dL" and yet another one to "g/L", there is currently (using the CDISC-CT) no way to automate such comparisons. With the combination of LOINC and UCUM, it becomes extremely easy.


Such comparisons and the conversion validation will significantly contribute to the safety of the patients. And that is why we are doing all this work for, isn't it?