Sunday, October 22, 2017

CDISC-CT: the madness goes on

I recently installed the newest CDISC controlled terminology (CDISC-CT) version 2017-09-29 in my databases for use with our CDISC RESTful web services.

When doing so, I noticed that more than 30 new lab test codes (LBTESTCD/LBTEST) have been added, this although the FDA has mandated the use of LOINC coding (in variable LBLOINC) as of March 15th, 2018. The SDTM-IG still states that LBLOINC is the "Dictionary-derived LOINC Code for LBTEST", but in most real cases, it is just the other way around: LBTESTCD and LBTEST are derived from the LOINC code, as that is what is delivered (or should be delivered) by the central or hospital labs. So esssentially, LBLOINC should be the "topic variable", not LBTESTCD.

Already more than 2 years ago, I published an article "An alternative CDISC-Submission Domain for Laboratory Data (LB) for Use with Electronic Health Record Data" which was well received, except for within CDISC. I had hoped it would become a starting point for a discussion within CDISC about how we avoid "the reinvention of the wheel" in SDTM, and better connect to the controlled terminology and coding systems that are worldwide used in healthcare. It looks however that the CDISC-CT refuses to correct its course, and goes on developing "lists of terms" that do not have a connection with what is used in healthcare and in science in general.

Except from terms for lab tests, CDISC has also developed lists of ... microorganisms (codelist MICROORG, NCI code C85491), in the latest version containing 1506 terms.

When I recently discussed this codelist with people from a tropical medicine institute, they asked my about the systematics and taxonomy of this codelist. Unfortunately, I had to admit that the codelist 'MICROORG' does not have a system nor taxonomy at all - it is just a list. They then asked me why CDISC is not using a worldwide used system that has a taxonomy and relations between them such as the NCBI taxonomy of cellular organisms.

The CDISC-CT codelist "MICROORG" contains the term "Absidia", without any information about its relationship with other organisms. Only in the "CDISC Definition" column, it states (as narrative text, i.e. unstructured) that it is a fungus. Such narrative texts are barely machine-interpretable, and thus also unsuitable for use in e.g. artificial intelligence systems.
Just for fun, I entered "Absidia" in the NCBI taxonomy browser. This is the result I got:

It does not only show me that there are a lot of types of "Absidia", it also shows me that it belongs to the "family" of "cunninghamellaceae", the "order" of "mucorales", which is in the "subphylum" of "mucoromycotina" which is in ..., i.e. we can easily retrieve the whole taxonomy. Through the "taxonomy ID" (4828), we could easily use RESTful web services to have our own systems find out information about this organism and e.g. to build "networks of knowledge" (I haven't checked yet whether such a RESTful service is provided by NCBI - one is surely provided by UMLS).

Does CDISC-CT provide this functionality? Not at all. It even does not provide us any information about how we can generate the CDISC term (which is surely not used in laboratories) from the usually used term such as the NCBI term.

So, also for microorganisms, it does not make sense CDISC "reinventing the wheel" and to develop and maintain "yet-another-codelist".
In my opinion, CDISC should stop developing codelists for which better, internationally recognized, systems or nomenclature already exists. Examples are LOINC for lab tests, UCUM for units, NCBI for microorganisms. CDISC should deprecate the own codelists ("lists of terms") when such a better, internationally recognized" system exists.
Some people will immediately state that this will lead to extra columns in the SDTM tables and undermines the SDTM systematics "test code / test name / test result", where for each domain, there is only one (CDISC) codelist for the test code allowed.
My proposal is that when such a better system exists, there would not only be a column "test code", but also a column "codelist system", containing either the "CDISC codelist name", or the name of the international coding system. For the latter, we can orient ourselves to the code systems used in FHIR:

If there is only 1 code system used within a table, it can just be listed as an "ExternalCodeList" in the define.xml.
This also means that in many of the "Findings" SDTM tables, only those "record qualifiers" can be maintained that are really necessary. For example, when using LOINC for lab tests, LBCAT, LBSCAT, LBSPEC and LBMETHOD become obsolete, as they are already provided by the LOINC code itself, and can easily be retrieved (in addition to a lot of even more useful information) by any modern application through the use of RESTful web services.

A major roadblock to come to a considerable better SDTM, with "biomedical concepts" instead of "rows in tables" is still the SAS Transport format. It does not allows us to have codes longer than 8 characters (Oh my God, what time are we living in), it does even not allow us to have a compact format for test results like in FHIR:

or even to provide different codes in different code systems:

So in order to make the next "quantum leap", and move SDTM out of the 20th century, we must not only start to use the internationally recognized code systems (instead of developing and maintaining our own "reinvention of the wheel" codelists), we must also finally get rid of SAS Transport 5 and move to a modern XML / JSON / RDF representation of SDTM data. For the latter, we need the cooperation of the FDA who stills mandates the use of this 30-year old, completely outdated format.

For the CDISC controlled terminology, we need a change in mentality in the CDISC-CT development teams. If that doesn't work or doesn't happen, it is time that the CDISC board takes action:


  1. Great article! The sad thing is that even for case where there is no 8 character technical limit (short name for the codelist, which is not used in submission) the CDISC CT uses this 8 character restriction.

  2. Good stuff, could not agree more.