Saturday, November 1, 2014

No to "Null Flavors"

Last week, I attended (part of) the CDISC webinar about an upcoming new batch "public review" SDTM-IG (v.3.3 - batch 2). It gave me good and bad news. First the bad news:
- even more new domains and many new variables. I am afraid that the CDISC SDTM trainings will soon need to be extended to 3 days instead of the 2 days right now.

The good news is that the SDTM team now proposes that "non-standard" variables (that until now are to be "banned" to SUPPXX data sets) may be kept in the parent domain (where they belong) and are marked in the define.xml by Role="Non-Standard Identifier" or Role="Non-Standard Qualifier" or Role="Non-Standard Timing".
This is something many of us ask already for years, essentially since define.xml 1.0 was published. You can read somewhat about this in my prior blog entries "Why SUPPQUAL sucks" and "SDTM and non-standard variables".

Very recently, there was also a webinar given by Diane Wold about the use of "Null Flavors" in CDISC. Now, Diane is one of the persons in CDISC that I highly appreciate, but in my personal opinion, she is completely wrong in this case: in my opinion, "Null Flavors" are evil.

Let me explain. "Null Flavors" have been developed by HL7 in HL7-v3 in order as a mechanism for the case where a value is not known, or cannot be represented by the HL7-v3 framework.
"Null flavors" are highly contested, even within HL7, e.g. see the blog "Smells like I dunno" of Keith Boone, one of the few "HL7-v3 gurus" and author of the best book about HL7-v3 and CDA.
One of the things I have against the "null flavors" is that it forces people to make a categorization on a reason why a data point is missing (or not representable in the HL7 framework). This categorization is extremely arbitrary, so it is of essentially no help when comparing data points. I.m.o. they just just write the reason as an extra data point (like --REASND in SDTM) as free text.
Another reason is that it encompasses values that DEFINITELY are not null. Examples are "TRC" ("trace" - which is definitely not null), "QS" ("Quantity Sufficient") meaning "a bulk/large amount of material sufficient to fill up until a certain level" (can a large amount be "null"?), "PINF" ("positive infinite") and "NINF" ("negative infinite), two amounts that every last class primary school student knows are not null. Even worse, CDISC is abusing "PINF" in the trial design datasets to state "there is no upper limit" (in the number of particants). A very strange way to define this: first set that the maximum number of participants is NULL, and then add a "flavor" saying that it is unlimited. My math school teacher probably turns around in his grave now ...

In Austria, our national Electronic Health Record system is based on HL7-v3 and CDA. But we do ONLY allow two "null flavors" which are really about nulls: one expresses that a patient has no austrian social security number (e.g. tourists), the other one expressing that the patient does have an austrian social security number, but we do not know it, e.g. as he/she forgot to bring the SSN card.
All other 13 "null flavors" are forbidden in the austrian EHR.

My opinion is clear: we should not copy the errors the HL7 organization made.


  1. I think the @Role attribute should not be overloaded with "Non-Standard" designation, but we should have a separate @Standard attribute. "Non-Standard Timing" is confusing, since it is a Non-Standard Timing variable with a standard time format (ISO 8601). Also, currently for standard domains the @Role attribute is optional, which will need to be required for Non-Standard variables. For the rest, I would be really happy to retire the SUPPQUAL concept.

  2. Thanks Lex,

    Do you mean something like @def:Standard="Yes" and @def:Standard="No" in the define.xml?

  3. Yes, but maybe we only need @def:Standard="No", since the default means "Yes".