Thursday, March 1, 2012

Null flavors in SDTM - a good idea?

CDISC recently published the "revised Trial Summary datasets" for the SDTM standard and implementation guide. One of the new "features" of the TS data set is that it has a number of so-called "null flavors". You can think of a null flavor as something similar like "reason not done" (--REASND), but than enumerated. For the TS dataset the enumerations (each a "flavor of null") are:
  • NI (no information)
  • INV (invalid)
  • OTH (other)
  • PINF (positive infinite)
  • NINF (negative infinite)
  • UNC (unencoded)
  • DER (derived)
  • UNK (unknown)
  • ASKU (asked but unknown)
  • NAV (temporarily unavailable)
  • NASK (not asked)
  • QS (quantity sufficient)
  • TRC (trace)
  • MSK (masked)
  • NA (not applicable)
the explanation being given in brackets.

It is clear that the idea, and its CDISC-SDTM-TS implementation, come from the HL7-v3 world, as the enumerated values (and not by accident I believe) are exactly the same as those in HL7-v3.
The use of "null flavors" is however even within HL7 highly contested. For me, and for many others, it is e.g. very illogical that a value can be null and positive infinite at the same time.
So some implementations of HL7-v3, such as the Austrian "Entlassungsbrief" (similar to the US "Continuity of Care" (CCD) document, have limited the enumeration to the absolute minimum - in this case to only two allowed values.

Personally, I am always suspicious when a list of enumerations has say more than 5-6 values, especially when it is not about "hard" characteristics. You can make an enumeration for "gender", like F (female), M (male) and U (unknown) which is good enough for 99.9% of the cases.
But in the "null flavor" case here, everything is pretty subjective ...
For example, when to apply "NI" and when to apply "UNK"?
And when we know that a value is "<1mg", should we add it as a value, or should we set it to NULL, and fill the "null flavor" with "TRACE"?

And what to think about "positive infinite" and "negative infinite"? These are surely not null!
Wasn't this introduced due to the unability of SAS Transport 5 to use the "∞" character?
In XML (Schema) one can simply define a value of being of type "xs:double" which includes "INF" (positivive infinite) and "-INF" (negative infinite).

Let us have a look why the authors of the new TS-SDTM introduced "null flavors". The argumentation (copied from the document) is:
"The proposal to include a null flavor variable to supplement the TSVAL variable in the Trial Summary dataset arose when it was realized that the Trial Summary model did not have a good way to represent the fact that a protocol placed no upper limit on the age of study subjects.When the trial summary parameter is AGEMAX, then TSVAL should have a value expressed as an ISO8601 time duration (e.g., P43Y for 43 years old or P6M for 6 months old).While it would be possible to allow a value such as NONE or UNBOUNDED to be entered in TSVAL, ..."
OK, but wait a minute ... why should the maximum age be expressed as a ISO-8601 "period". It was never designed for that. And what about a maximum age criterium like "at least 30 years older than the age at which birth was given the last time". The latter could surely be a valid "age" criterium (but it could also be part of an inclusion criterium). So in my opinion, the developers of the SDTM have abused the "duration" data type here.
And for the AGEMAX parameter, shouldn't "unbounded" be "null" (i.e. there is none), or alternatively "∞". But yes, the latter cannot be depicted in SAS Transport 5 ...

What do you think? Is it a good idea to have "null flavors" in SDTM? Or do you think it isn't?
Just let me know ...


  1. Null flavors are important. Nulls can be part of the data element or be handled by some other mechanism.

  2. Thanks Vojtech!
    I must say honestly that I changed my opinion a bit since I wrote this blog. In Austria's EHR system, we have constrainted the null flavors to only 3 or 4 values or so, which I think is OK. However, I am still convinced that using "PINF" to state that there is no upper limit is abuse. Semantically, one cannot say that a value is at the same time "null" and "infinite".