Saturday, December 30, 2017

CDISC-CT 2017-12-22: more madness

In my previous posts, I reported about the madness that goes on in the development of CDISC (especially) lab test codes, and the problems related to the CDISC approach. For example, the CDISC approach does not allow to define tests like "maximum in the last 24 hours" or "average over the last 24 hours" (e.g. for vital signs temperature, blood pressure, or concentration of a substance in blood or urine). Such definitions are however an integral part of the LOINC coding system, over the "time aspect".

Now that the FDA has mandated the use of LOINC coding for laboratory tests, it would be expected that CDISC stops the development of an alternative system for lab tests. The latest CDISC controlled terminology (dated 2017-12-22) however again contains over 40 new lab test codes.

Why?

There are several reasons for this.

First of all, we need to take into account that CDISC lab test codes are NOT lab test codes, they only specify "what" is measured. This corresponds to the "analyte/component part" in LOINC. So, for example, the CDISC "GLUC" ("glucose") "test code" essentially represents hundreds of different tests where glucose is somehow (presence, qualitatively or quantitatively) measured. So, CDISC-CT is "post-coordinated", meaning that it needs to be combined with content from other variables to uniquely describe a test. In practice, however, this does not work: with the CDISC system, reviewers can never find out whether test A in one study of one sponsor is the same as test B in another study from another sponsor: only the LOINC code can do this, and this is exactly why the FDA started requiring LOINC coding for lab tests.
If we read the latest "Recommendations for submissions of LOINC codes", published by FDA, CDISC and the Regenstrief Institute, we read that even when LOINC codes are submitted, it is still mandatory to populate the CDISC-CT "lab test code" (which it isn't), and all other "classic" "identifying variables" such as the specimen, the method, etc.. I.m.o. this is stupid, as it adds redundancy to the record. For example, if the provided LOINC code has contents that deviate from the contents of LBTESTCD, LBSPEC, LBMETHOD, which of both then contains the truth? The LOINC code or the CDISC test code? I.m.o., this testifies that CDISC is still not ready for giving up it's own system (which is not a system, but just a list based on tradition), but needed to accept the decision of the FDA, though with displeasure. 
One of the arguments of CDISC for their "postcoordination" approach has always been that "research is unique", "does not dictate any tests" and that for many lab tests in research, there is no LOINC code. The latter is essentially not correct, as I have found out in the recent years. I estimate that for over 80% (if not over 90%) of the "test" codes published, there is at least one LOINC code (often many more) in the LOINC system. As I stated, LBTESTCD essentially corresponds to the "analyte/component" part of the LOINC system, and my conservative estimate is that for over 98% of the CDISC "test codes", there is an entry in the "analyte/component" list of LOINC (the latter can be obtained as part of a separarte database from the LOINC website).
The real reason for CDISC not giving up their system is probably (besides "not invented here") that CDISC is sticking to the 8-character limit for LBTESTCD. The "analyte/component" part in LOINC does not have this limitation.

What we see in the newest (2017-12-22) version of the CDISC-CT for lab tests is that for almost each of the new terms (when looking at the "CDISC definition", a corresponding entry in the "analyte/component" part can be found. The only major difference is that CDISC then additionally assigns a <8-character code to it. So, we are seeing the CDISC LBTESTCD values evolving into an <8-character representation of the "analyte/component" part of LOINC - if it wasn't that yet.

In the next months, I want to try to do some research on how "equal" LBTESTCD/LBTEST is with the "analyte/component" part, using a quantitative approach, for example by text comparison techniques like by calculating the "Levenshtein distance" between the value of LBTEST (or the CDISC definition) and the "analyte/component" part of LOINC.
The hypothesis of my research will be that LBLOINC is nothing else than a copy of the "analyte/component" part of LOINC, but then restricted to 8-characters.

If the hypothesis is found to be true, we might as well replace LBTESTCD/LBTEST with the "analyte/component" part of LOINC if we do want to keep a "post-coordinated" approach for SDTM (which I doubt we really need). This essentially would mostly correspond to what I proposed a few years ago in my article "An Alternative CDISC-Submission Domain for Laboratory Data (LB) for Use with Electronic Health Record Data", which i.m.o. combines "best of both worlds".

In order to have such a "best of both worlds" approach (my article can just be a starting point), we however need to remove the 8-character limitation on xxTESTCD, which is there for historical reasons only, and not for any technical reasons anymore. The SDTM team however seems not to be prepared to change anything there.

In my next blog entry, I will probably write something about the more than 230 "PK units" that have been added to the newest CT version, although there is a UCUM notation for each of them. 
Unfortunately, the title of that post will probably also need to contain the wording "CDISC-CT madness" ...

1 comment: