Saturday, June 30, 2018

CDISC-CT 2018-06-29: the madness doesn't stop!

Today, I downloaded the new CDISC-CT version 2018-06-29.

As usual, I start working from the XML file (named "SDTM Terminology.odm.xml") for filling my databases, for use with our free CDISC RESTful Web Services and bringing the XML in a more suitable format for working with modern tools like the "SDTM-ETL" transformation software.

I quickly found out that something is TERRIBLY WRONG: in the XML file, it states "2016-06-29" everywhere (at least 5 times), where it should be "2018-06-29".

So I asked myself: "did the CDISC-CT team do quality control on these files?".

It was good that I found this, otherwise, if I had trusted the file, and used it to fill my databases, the latter would have been corrupted considerably.

Then I had a look at the changes.
I immediately found out that 23 codes have been removed completely (i.e. "deleted"). This is very bad practice! The good practice is to either "deprecate" such codes, or to "flag" them as being "not actual" anymore. So, if a sponsor worked hard on a submission with a slightly older CT version, thinks everything is all right, and then sends it to the FDA, and the FDA uses the latest CDISC-CT (as it usually does), suddenly a large number of validation errors may come up, although the sponsor did not do anything wrong - it is the CDISC-CT team who ignored good practice in standards development!
Also, I wonder why these codes are deleted.  Was it that the CT team recognized that the such a code was not appropriate and thus retreats it? Again, did they do QC on their earlier additions? Or did they just throw in codes with the idea that these can be deleted later anyway?

Worse than entirely deleting codes is to change the meaning of a code. Essentially and principally, this should never happen. Changing the definition of a code is extremely dangerous, as then, one and the same code is used for possibly two different things, dependent on the codelist version. So, the good practice (followed by all other SDOs that I know), is to "deprecate" the old code, and to assign a new code to the object with the new definition.
The CDISC-CT team completely ignores this principle, and just like that, changes the definitions of no less that 184 codes for use in SDTM.

Of course, I already can hear their argument that the definition changes are all minor changes. A change is a change however, and how can a machine understand that a change is minor or not? Machines are not that far yet, only humans can do so, but honestly, are you going to inspect all these 184 codes for which a change in definition was given?
And even then, some changes are serious! For example, for TESTCD=VITB12, the definition changes from "A measurement of the Vitamin B12 in a serum specimen" to "A measurement of the Vitamin B12 in a biological specimen", which is an enormous widening of the scope of the code.
Or did they forget to do QC when introducing that term in the earlier version 2018-03-30?

LOINC coding has been made mandatory by the FDA for laboratory test codes. As LOINC codes are much more precise than CDISC lab test codes, there is no reason anymore for further developing laboratory test codes. Although this is obvious, the CT team has added 188 (hundred and eighty eight) new lab test codes.

Essentially, the value of LBTESTCD corresponds to the "first dimension" of the LOINC name for lab tests, i.e. the "component/analyte" that is measured. If it is the same thing, why don't we at CDISC just align that with the one from LOINC? Wouldn't be that much better?
Some arguments that might apply:
a) LOINC "component" can have more than 8 characters, CDISC-LBTESTCD is limited to 8 characters (Oh my God!)
b) We don't know LOINC (well: you should)
c) We always did it this way ...
d) Not invented here ...
e) Not in LOINC yet

In the latter case, it would be much much better to then do a "new term request" in LOINC, and not in CDISC. The process is similar as for CDISC-CT, and the speed with which a new term can be approved is about the same, but the quality control at LOINC (taken care of by the Regenstrief Institute) is much much better.

For the very first of the "new" lab test codes ("AMBRBTL" = "Amobarbital") I could easily find a LOINC code, for example LOINC code 72399-9 "Amobarbital [Mass/volume] in Blood by Confirmatory method". So why not just take the LOINC "component" "Amobarbital" as the new LBTESTCD? Oh yes, it is more than 8 characters ...