Continued from part 1: SDTM --STRESN: why we need UCUM - Part 1
Another interesting statement from the same paper is:
Another interesting statement from the same paper is:
"The process of
converting the lab test results to SDTM is not without its own set of
challenges. For example, Sponsors or CROs must map what they obtain from the
labs to CDISC controlled terminology, which is an error prone process."
One of the arguments against LOINC (and UCUM) is that
researchers should learn yet-another coding system. Well, the same applies to
CDISC-CT isn't it? But when all lab data come with the LOINC code (which is
more and more the case) and with UCUM units (which is the case when the
instrument with which the measurement is done also knows the LOINC code), then
there would be no mapping needed at all, and the "standardization" to
--STRESN could be automated completely.
This case is also described in the paper of Nataraj and Piper
from SHIRE:
"Furthermore, the
standardization of units, and the conversion of results to standard units, is
often a significant challenge. As a starting point, it is important to map all
units (both as collected in the original unit variable LBORRESU, and as
standardized in the LBSTRESU variable) to CDISC terminology when possible.
Currently the LB domain uses CDISC controlled terminology for units as
represented in the UNIT codelist. There are instances when the units used in
other systems such as electronic health records are based on the Unified Code
for Units of Measurement (UCUM) standard. Currently there is no 1:1
relationship between the CDISC Controlled terms of units and UCUM. Until this
is resolved, it will remain as a significant challenge for future data
interoperability."
Essentially (my own interpretation!) SHIRE is asking here
for being allowed to use UCUM notation in SDTM, as then all necessary error
prone mapping would disappear.
Coming back to the original request from Clem McDonald: what
is the use case for the FDA?
In order to demonstrate this, I took a real SDTM-LB and a VS
dataset (from the FDA-CDISC pilot 2013
and replaced all units in –ORRESU
and –STRESU by units in UCUM
notation where necessary (as there is overlap). Here is a snapshot:
Where one also observes that some of the units are both
valid in both CDISC as well as UCUM.
All OK? Try to look with the eyes of a reviewer at the FDA,
who looks at thousands of such data points. Nothing special?
Ok, let us now, using the NLM RESTful web services for UCUM redo the now automated
conversion from LBORRES to LBSTRESN, and if the outcome differs more than 0.1%
issue a warning (by coloring the cell orange), and if the deviation is more
than 0.5%, issue a conversion error (by coloring the cell red). This is not
more than a few lines in the Java code of the viewer (as it uses the NLM web
services), AND, it is generic, i.e. works for every UCUM unit (no reviewer
programming required). Then we get:
Marking all the "basophils" measurement as
containing a conversion factor: the "standardized" units being a
factor 10 too high. As a consequence, the SDTM programmer all marked them as
"HIGH", whereas they essentially would have been marked as
"NORMAL" if the conversion was done correctly. Using CDISC units,
these errors would probably remain unnoticed, as CDISC-CT does not provide any
conversion at all from the [UNIT] codelist, and surely not in a
machine-readable way. And if the SDTM programmer would have used the NLM
RESTful web service for executing the
conversions, which is only possible for units in UCUM notation, these
errors would not have occurred anyway.
Ok, maybe you did find the (somewhat obvious) errors, so let
us try something a bit more difficult. We will have a look at the results of
the VS dataset. Remark that the UCUM notation for "mmHg" is
"mm[Hg]", for "inches" it is "[in_i], for
"pounds" it is [lb_av]" and for "Fahrenheit" it is
"[degF]. For none of these, the corresponding CDISC units from the [UNIT]
codelist have conversion factors between them, especially not in a
machine-readable way.
Here is a snapshot from the VS (vital signs) dataset:
Here is a snapshot from the VS (vital signs) dataset:
When we however allow to do --ORRES to --STRESN conversion validation
using the NLM RESTful web services, this is what we get:
Did you see the conversion errors yourself without grabbing
your pocket calculator or without doing some programming yourself for each of
the tests separately?
An even nicer demo, but more hypothetical, is for the blood
pressure, where I supposed that is was reported in "pounds per square
inch" (UCUM notation [psi]). I guess however no medical doctor is
reporting blood pressure in pounds per square inch, but it is only for the
demo. Here are the data:
Also remark the UCUM notation "{beats}/min", using
so-called "annotations"
to clearly separate property (what is measured) and the unit itself. The CDISC
equivalent is "BEATS/MIN", mixing both up.
Now doing validation (not easily possible with CDISC units)
using the NLM UCUM RESTful web services leads to:
Molar units
I have been asked several times whether the RESTful web
services can also do "molar conversions", e.g. from "mol/L"
to "g/L". At this moment, it is not possible yet, but as well the NLM
as a project team of my student at the university is working on it. For such a
conversion, the molecular weight of the component needs to be known. Also here,
we cannot work with the CDISC-CT, as the [LBTESTCD] codelist (which essentially
contains the chemical compound that is being tested, which is essentially not
the test itself) does not contain any information about molecular weights.
Through the LOINC code of the test, the molecular weight of the component can
easily be retrieved (through the "LOINC group"). A simple join
between 2 tables in the LOINC database suffices.
So we will soon, in cooperation with NLM, have a RESTful web service for such conversions. The query string will look something like:
[base]/ucumtransform/from/mmol/L/to/g/dL/LOINC/15076-3
So we will soon, in cooperation with NLM, have a RESTful web service for such conversions. The query string will look something like:
[base]/ucumtransform/from/mmol/L/to/g/dL/LOINC/15076-3
Where "15076-3" is the LOINC code for
"glucose [moles/volume] in urine).
Or the other way around:
[base]/ucumtransform/from/mg/dL/to/mmol/L/ LOINC /5792-7
Or the other way around:
[base]/ucumtransform/from/mg/dL/to/mmol/L/ LOINC /5792-7
Where "5792-7" is the LOINC code for "Glucose
[Mass/volume] in Urine by Test strip".
Remark that in the "FDA pilot submission 2013",
LBSPEC (permissible in SDTM) is missing, so that we can only guess what the
specimen was. LBCAT=CHEMISTRY lets us guess that it was
"Serum/Plasma", for which the LOINC code is then 2345-7:
Once the extended RESTful web service in place, this will
(as requested by a number of pharma companies) also allow to
"standardize" from molar to mass units and vice versa, as in the
above example, but this time in an automated way, at least when UCUM notation
is used for the units.
For the FDA it will then also be possible to check the conversions as done by the sponsor, in exactly the same way as described before, using the NLM RESTful web services.
For the FDA it will then also be possible to check the conversions as done by the sponsor, in exactly the same way as described before, using the NLM RESTful web services.
What the combined use of LOINC and UCUM will also enable the FDA is to start comparing laboratory results between studies and sponsors. If one sponsor "standardizes" glucose lab results to "mmol/L" and the other to "g/dL" and yet another one to "g/L", there is currently (using the CDISC-CT) no way to automate such comparisons. With the combination of LOINC and UCUM, it becomes extremely easy.
Such comparisons and the conversion validation will
significantly contribute to the safety of the patients. And that is why we are
doing all this work for, isn't it?