A very interesting discussion (partially by e-mail, partially on the web) has been taken place in the last weeks on the question on whether --ORRESU ("Original Units") must be under control of terminology or not.
The question arose from another discussion on why there is a CDISC codelist for units, although there is already an international standard for that (UCUM), which is well used by the hospital world, and for which the use is mandatory in exchange standards for electronic health records (EHRs) such as HL7-CCD and CDA and ISO-21090.
As I was informed by someone from the FDA, the use of UCUM is also mandatory in SPL (Structured Product Labeling).
The discussion was also triggered by the observation that when one extracts lab data from EHRs and use that in clinical research and it leads to an SDTM submission, the value in LBORRESU in some cases leads to a validation error when using OpenCDISC. A typical example is "mm[Hg]" which is the international symbol (UCUM) for "millimiter mercury" as a unit for the property "pressure". OpenCDISC throws an error as "mm[Hg]" is not in the "UNIT" controlled terminology codelist.
Other people remarked that when using paper case report forms, the way the unit is written on the CRF varies very often (when not preprinted). For example, for a mass concentration, they found it written as "Gr/dl", "Gramm per deciliter", "Gramm/dL", "g/dL" etc.. Only the latter is correct CDISC controlled terminology.
So if I find "Gramm per deciliter" on the CRF, what must go into LBORRESU?
The SDTM-IG for LBORRESU states "original units" for the variable label, but the column "Controlled Terms, CodeList or format" states "(UNIT)", meaning that the value must come from the "UNIT" codelist.
Whether this codelist is extensible (by the sponsor) is not mentioned in the SDTM-IG.
In my opinion, the wording "original units" contradicts with the statement that the "UNIT" codelist must be used.
Only in an ideal world, every researcher would exactly use what is in the CDISC "UNIT" codelist when writing (as narrative text) the unit on the paper CRF.
A response came from a distinguished SDTM pioneer stating that one needs to distinguish between the "actual unit" that came from the CRF, and the way it is represented in SDTM, and which needs to come from the UNIT codelist.
So that would mean that if the researcher had written "mill.Hg" on the CRF, this would never appear in VSORRESU, but needs to be "translated" into "mmHg".
Similarly, when the data comes from an EHR, the UCUM unit "mm[Hg]" will never go into VSORRESU but needs to be translated into "mmHg" again.
If we do so, how much is ORRESU then still "original unit"? Isn't such a translation a "derivation"?
What about "translation errors"? Can we trust that the unit is always translated correctly into one from the UNIT codelist? I (and a lot of other in the discussion) have some serious doubts.
What about tests for which there is a unit available, but for which there is no unit present yet in the UNIT codelist for that property? Do we need to wait submitting to the FDA until the moment that our "new term request" is approved?
And isn't it a bit strange that CDISC just choose to ignore/exclude a standard (UCUM) that is very well used, and choose to develop its own list for units? One statement in the discussion was that UCUM is not well known in smaller labs and by a number of LIMS systems, but is that a reason to develop a whole new list, and then to say that the labs and LIMS should learn that new list, instead of stating that they should learn UCUM? It reminds me about the cartoon that you can find at http://xkcd.com/927/.
One very good suggestion was given by a representative of a biotech company, proposing to have two variables for the "original" units, one named e.g. --RPRESU (new), the "reported result unit" which is "as collected", so not under controlled terminology, and --ORRESU (existing), which is the translation into something that is in the "UNIT" codelist. As such, the "as collected" unit would not be lost, and the reviewer at the FDA can use the (--ORRESU) term that is under CDISC controlled terminology because that is familiar to him/here.
So, when the information comes from an EHR, e.g. "mm[Hg]" (UCUM) would go into VSRPRESU and would then be "translated" into "mmHg" (CDISC-CT) which then goes into VSORRESU.
Which of the two (if not both) would then be made "expected" is also an interesting discussion.
Though this doesn't take away my objections agains "the reinvention of the wheel", I think it could be a very good compromise.
Whether derived variables should go into SDTM at all is another discussion. For example, we must submit "AGE" in the demographics domain although that is usually never collected, but derived from BRTHDTH (birth date) and RFSTDTC (reference start date). Whether such derived variables should be in SDTM is surely related to the question "Is SDTM a database (design), and if so - is it a good one?" which I discussed in a previous blog entry.
My next blog entry will probably go about why SDTM has more and more derived variables.