Friday, January 5, 2018

CDISC-CT 2017-12-22: PK Units

A few days ago, I reported about "more madness" in the newest CDISC controlled terminology (version 2017-12-22), especially regarding the addition of more CDISC lab test codes whereas LOINC coding is made mandatory by the FDA anyway. When I see this, I sometimes I ask myself who is the "better standardization organization", the FDA or us, CDISC?

Even the survey that CDISC did on LOINC (under the lead of the former CSO who blocked every progress) was shaped in such a way that it was mostly about the difficulties of LOINC adaption, rather than on any of the advantages and opportunities of the use of LOINC.

But today I want to discuss another part of the new controlled terminology. If you inspect the "changes" file, you will notice that 235 "PK units of measure" have been added, bringing the number of CDISC "PK Units" to a total of 528.
This is crazy! I will explain you why.

Units for PK (pharmacokinetics) usually consist of a relative high number of parts. An example is "g/mL/(mg/kg/day)" (gram per milliliter per (milligram per kilogram per day)). CDISC publishes these "units" as a list, and not as a system (as UCUM does). Taking into account that any of these parts can vary enormously in magnitude (for e.g. the first part from nanogram to kilogram), you can already imagine the number of possible combinations which need to be added to the "list" to come to complete coverage. So, in principle, this list may and will grow to almost infinity. UCUM however is a "system" where any possible combination can be tested on its validity, e.g. using the NLM RESTful web services and website as well as our RESTful web services. UCUM also has the additional advantage that conversions can completely be automated, e.g. also using RESTful web services.

So, what I did was to add the UCUM notation for each of the newly published CDISC "unit". If you want a copy of the file, please just send me an e-mail. We will also soon add these UCUM notations to our web service to find the UCUM notation of any CDISC "unit".

Why does CDISC continue "lists" of units that "must" grow into infinity and can never be complete? Why doesn't it allow UCUM notation, which is used by 99% of the medical world, whereas CDISC "units" is used by less than 1% of the world? "Not invented here"?

A few arguments I have heard or found in the past:

  • "UCUM expressions, in order to support computability, represent familiar units in unfamiliar ways, with curly brackets and other symbols" (CDISC CT team - see image above).
    When I inspect the UCUM notation I assigned to the new "PK units" however, I see nothing "unfamiliar" and even if that would be the case, the advantages (like automation of unit conversions) far outweigh any disadvantages. Furthermore, we must take into account that implementors must also learn the CDISC "units" in addition to their own notation, so this argument is nonsense. Why should people be forced to learn a notation that is not used in the healthcare world anyway? UCUM however is extremely popular in the healthcare world.
  • "The CDISC notation is very similar and strongly overlaps with UCUM notation. So there is no problem". I checked for this case (PK units) and found that for only about 80 of the 235 "PK units", so about one third, the CDISC "unit" and UCUM notation are identical.
  • "UCUM allows some alternative representations, like l or L for liter. For aggregators and others who want to have a single expression, this is not ideal". This is really nonsense! Any computer system can easily be teached that "1L" = "1l". It can even be automated, using a RESTful web service. For example, try for yourself:
UCUM also allows to automate unit conversions easily, as it is a "system" and any UCUM notation can be reduced to a set of base units. For example, try to find out how many "millimeter mercury" corresponds to 25 "pounds per square inch". Can you find the conversion factor from what has been published by the CDISC-CT team?
Using UCUM, this is "piece of cake", as both can easily be reduced to the base units ("g.s-2.m-1" in this case). Using UCUM, the anwer can easily be found using one of the RESTful web services available (and "YES", you can also use these from within your SAS programs). Try it yourself:[psi]/to/mm[Hg]

One of the things I did is the following: I tried to find out for how many of the 235 "PK units" can be converted into one another, using the UCUM notation and using our RESTful web services. Using CDISC notation, this number is zero, as CDISC-CT does not provide any information at all about what the "units" mean and how they relate to other units.

So I wrote a "quick-and-dirty" Java program,  using the aforementioned "UCUM RESTful web services", and found that of the 54,990 possible combinations (235x234), there is a conversion factor for 1158 of them, meaning that the two units represent the same property. For a good number of them, we found that the conversion factor is a power of 10, meaning that they just differ in the order of magnitude (just like "cm" and "m"). For example:

However, when using the CDISC-CT term, there is no way at all to find out what that conversion factor is (remark that in CDISC-CT, the notation "day" is used instead of the internationally recognized notation "d"), or that two "units" refer to the same property.

We also found a number of related terms for which the conversion factor is exactly 1.0. For example:

or: two ways of writing the same unit, which is forbidden by the SDTM-IG.
"Wait a minute" you will say, "these pair members correspond to different properties!".
And yes, you are right, but to what properties?

For example, for the first entry in the above list (here in CDISC notation): "day*g/mL/g" and "mg/mL/(mg/day)" have a conversion factor of 1 (difficult to find out when using CDISC notation) but do indeed correspond to different properties, the first being something like "day times gram (of what?) per milliter (of what?) per gram (of what?). Using CDISC notation, you will never find out about the "what?". Using UCUM, you can easily do so using "annotations".
For example (fictitious - I am not a specialist in pharmacokinetics): "d.g{analyteXYZ}/mL{blood}/g{drug}", explaining very well what the unit is about, without endangering applying conversions - the annotations in curly brackets can be taken into account automatically. These "annotations" is exactly what the CDISC-CT team does not like at all: "... represent familiar units in unfamiliar ways, with curly brackets and other symbols. This is off-putting to some users." (sic). That the annotations enormously help, was even recognized by LOINC, where such annotations have been standardized. E.g.:

showing that LOINC standardized on annotations like "RBCs" (red blood count), "titer", "creat" (creatinine) and many others. Of the over 80,000 LOINC codes, there are over 7,600 having such an annotation in the "preferred UCUM unit", which is almost 10%.

So, rather than extending an ever growing list of "units" over and over again, the CDISC-CT team should better, in close cooperation with LOINC (the Regenstrief Institute), concentrate on standardizing such annotations for use in clinical research.
As LOINC coding for lab tests in SDTM is required by the FDA anyway, use of UCUM notation should be allowed immediately, the CDISC-CT team should stop generating lists of units, and should work on "UCUM annotations" for use in clinical research instead.
This should bring the usability of SDTM to a much higher level than it has today.