Saturday, January 27, 2018

Biomedical Concepts, LOINC and CDISC

Very recently, my colleagues from A3 Informatics published a very interesting article on "Understanding Biomedical Concepts", explaing how a "metadata repository" (MDR) with biomedical concepts (BCs) can help developing from the CRF

Unfortunately, most protocols are unprecise in what exactly should be measured. Even our "CDISC Therapeutic Area Guides" (TAUGs) suffer from the same problem. Here is an extract of the "TAUG-Diabetes" which can be downloaded from the CDISC website:

It listens a number of relevant tests, but does not describe them in detail at all. So it leaves it open to the sponsors, CRF developers or in the worse case, to the site to decide what test exactly will be performed. This means that when the FDA wants to compare the results from ten diabetes studies, it might find twenty different ways of measuring e.g. triglycerides, or even worse, find values that originate from different tests but received the same values for LBTESTCD, LBCAT, LBMETHOD in the SDTM. So for FDA reviewers, it looks as the triglycerides measurement from study A is identical to that of study D, although this is not the case at all. The reason is that the combination of LBTESTCD, LBCAT, LBSPEC and LBMETHOD does NOT uniquely describe a laboratory test.

So, I started annotating the TAUG-Diabetes, or at least tried starting. Here are some first results, from page 20: "Lipid Panel":
  • Amylase/Serum: LOINC 1798-8: Amylase [Enzymatic activity/volume] in Serum or Plasma
  • Triglycerides, Serum,Plasma: LOINC 2571-8: Triglyceride [Mass/volume] in Serum or Plasma
  • Total Cholesterol, Serum, Plasma: LOINC 2093-3: Cholesterol [Mass/volume] in Serum or Plasma
and so on. I did not try to annotate everything, that is something that the TAUG development team, with a much better use case knowledge, should have done (they didn').
When it is about lab tests, using LOINC coding is most appropriate. Therefore, it is very surprising that the word "LOINC" does even not appear in the TAUG-Diabetes.
Also remark that such an annotation exercise does not always lead to a single test  (at the contrary). For example for "glucose in serum, plasma", leads to 645 (!) different tests!.
On page 21 of the TAUG-Diabetes, table "Kidney function", a test "blood urea nitrogen" is listed, for which one can find 11 different tests in LOINC:

which will mostly obtain the same combination of "identifying variables values" in SDTM, suggesting the results come from exactly the same test although this may not be true.

Coming back to the Biomedical Concepts (BC). The more I think about it, the more I get convinced that LOINC codes are just implementations of BCs.
The typical example that is given when explaining BCs is "diastolic blood pressure" as a "vital sign". Please see the A3 Informatics article for a picture.

The BC of "systolic blood pressure" consists of the test itself (CDISC coded SYSBP or NCI coded C25298 - the hyperlinks will lead you to the RESTful web services that you can use in your own applications), a body position (sitting, standing, suppine), and a unit (almost always millimeter mercury column), and the result is expressed as an integer or a floating point number.

Is this the complete picture? No, it isn't. "Systolic blood pressure" is one of the tests in a "blood pressure panel", which is part of the "vital signs test panel". In CDISC, the latter is an SDTM domain (VS), but for the middle part ("blood pressure panel") there is no CDISC term as far as I know.

When using LOINC, each of these can be assigned a LOINC code, with the remark that LOINC codes are "pre-coordinated". So you will find a LOINC code for each of the combination of parts of the BC that make sense. SDTM-CT is mostly "post-coordinated", meaning that you take the parts and then assemble them using LBTESTCD (which is not the test code, but essentially the code for the analyte or compound that is measured), LBSPEC (specimen, e.g. "blood", "serum", "urine", ...), LBMETHOD (e.g. "dip stick") etc., and combine these in a record AFTER (therefore "post") you did the test. This post-coordination requires extensive validation in order to find out whether the combination makes sense, whereas you know in advance that the combination makes sense when you use a pre-coordinated code. For example, it does not make sense in SDTM to combine test code "height" with position "sitting", and you will need to write software to check this. In LOINC, you just won't find a code for "body height, sitting".

The more I think about it, the more I get convinced that LOINC codes are "implementations" of BCs. For example, if you take the BC "systolic blood pressure", and select "sitting" for the position, and that you want a number as an outcome ("quantitative measurement), this will lead you to LOINC code

8459-0 (Systolic blood pressure--sitting, quantitative).
BUT, you can also easily find out that this test is a member of the test "blood pressure panel" with LOINC code 35094-2:

but is also is a member of  the panel "Orthostatic blood pressure panel" (LOINC code 34553-8):

which additionally contains 3 "types" of heart rate.

and all of these are "vital signs measurements" (LOINC code 29274-8). Other such "panels" are "Vital signs, weight & height panel" (LOINC Code 34565-2) and "Vital signs, weight, height, head circumference, oxygen saturation & BMI panel" (LOINC Code 85353-1), each of these forming a tree structure. For example for
"Vital signs, weight & height panel" (LOINC Code 34565-2): 

where it is interesting to see that this tree structure also contains "Body position with respect to gravity" (LOINC code 8361-8) with possible values "standing", "sitting" and "lying":

And here we see that even LOINC is not perfect or complete: it does not differentiate between "supine" (lying horizontally with the face and torso facing up) and "prone" (face and torso down), although we do find a (pre-coordinated) term for "systolic blood pressure, supine" (8461-6): I did however not find a LOINC code for "systolic blood pressure, prone", which may be related to the fact that it seems that it does not make a difference for the value itself. When I then looked back to the picture of the A3 Informatics arcticle, I found that it lists "supine", but it doesn't list "prone". However, in the CDISC "VS Codetable", the combination "systolic blood pressure" with "prone" is listed as a valid combination. Something to discuss when we develop and coordinate BCs...

Now, is our BC picture for "systolic blood pressure" perfect? Not at all!
It does not account for tests like "maximum systolic blood pressure in a time period of 24 hours" or "mean systolic blood pressure in 10 hours". These kind of tests cannot be handled by CDISC controlled terminology at all! That such tests are important was recently discussed during the development of the "TAUG-Ebola" where "maximum body temperature within 24 hours" is a very important indicator. But CDISC-SDTM and CDISC-CT could not find a way to represent this test in SDTM! The same applies for ebola to "highest pulse in 24 hours".
So for our systolic blood pressure BC, we still have add information about things that have to do with timing. And this is again where LOINC helps us enormously, as one of the "dimensions" of the LOINC system is the "time aspect". For an "ordinary" systolic blood pressure, the value for the "time aspect" is "Pt" ("point in time). But we also find many other systolic blood pressure tests where this value is not "Pt" (or otherwise said: "now"):

and a number more ...
Similarly, we find different tests with different values for the "time aspect" for "body temperature", "pulse", but also for "glucose in urine", where a very important one "Glucose [Mass/volume] in 24 hour Urine" (LOINC Code 21305-8) is again not well covered by CDISC-CT and SDTM (SDTM suggests to do something with adding "end of collection" to "start of collection" (see further on).

Ready? Looks like ... But how do we state "high systolic blood pressure"? This may be of interest e.g. when the patient was asked "did you already have a high blood pressure five years ago?" and there is no possibility to find out the exact numbers. Also for this, LOINC has codes, for example for the question "Do you have hight blood pressure?", the LOINC code is 64496-3

We made some statements about relations between LOINC codes. How can you find out about these? You can of course start browsing through the "LOINC details" pages and follow links, but a better way is probably to use to write an application using the UMLS RESTful web services of the National Library of Medicine, and filter the results on LOINC as the coding system: the UMLS tries to describe relations between ALL possible coding systems in the medical world (including CDISC-CT). One of the students at the university is currently developing an interactive graphical user interface to build such "networks". This GUI will e.g. allow to filter on LOINC and SNOMED-CT, so that you can also include the SNOMED-CT terms and relationships.

If we look at the different "6 dimensions" of LOINC, the more I get convinced that for the case of vital signs and laboratory tests, these six dimensions form the "ingredients" of the BCs for these domains. For other domains, other coding systems may be suitable. For examples for domains about microorganism, the NCBI coding and taxonomy is probably very suitable. I haven't looked into this however yet. For many other domains, SNOMED-CT is probably very suitable.
And then saying that SNOMED-CT is not used at all in SDTM or CDISC anyway, except for a few parameters in the "trial summary" domain. 

How does this fit with SDTM? It does not well.
For laboratory tests, we already know for a longer time that the "identifying variables" (LBTESTCD/LBTEST, LBSPEC, LBMETHOD, ...) do not uniquely identify lab tests. For "24h urine", the SDTM-IG states that that "the start date/time of the collection goes into LBDTC and the end date/time of collection goes into LBENDTC", which at first glance seems OK. However, when the data is coming from an EHR or from the hospital information (HIS) system, the exact "start of collection" and "end of collection" times will often not be known, and the sponsor will probably (need to) derive LBENDTC by simply adding 24 hours to the collection date/time? You can already guess where this leads to when it is known that it was "24 urine" but only the start collection (or only end collection) date was know with no time. This is what we call "imputation".
In vital signs it is even worse, as has been shown by the "Ebola" case "maximum temperature in 24 hours", which cannot be modeled in VS at all.

So essentially, when the LOINC code is known (from the lab itself or as it was predefined) then there is no reason at all to populate --TESTCD, --SPEC, --METHOD (and --POS in the case of VS), as it is all in the code yet. Even worse, "deriving" these variables may and will lead to confusion. Therefore, in the case the data e.g. come from EHRs, we should use alternative SDTM domains that are suited for EHR or other systems where a precoordinated code is available. For LB, I have made a proposal for such a domain already 3 years ago, which allows to be used both by the pre-coordinated case as well as for the post-coordinated use case. It is a proposal that needs to be adapted for the more general case, also allowing more flexibility in SDTM. For example, if an exact code for a test (whatever the domain is), one essentially only needs to provide three or four values, the code, the code system, the value and the unit. Sometimes additional variables will needed to be added, but whether it is really needed essentially depends on the code system. For example when a LOINC code is used, there is no need at all to provide a categorized "specimen" or "method". But these may be necessary when an NCBI code is used for the microorganism that is tested. What kind of information needs to be added additionally to the code itself is something we need to investigate for each domain (with "we" I essentially mean the SDTM teams).

This will also be a very nice exercise in order to try developing BCs for more complicated cases like for the microbiology domains MS and MB (using NCBI or SNOMED-CT coding) or MI (microcospic findings). In my opinion, CDISC should free resources by stopping developing of some "reinvention of the wheel" codelists and assign these to the development of BCs.


  1. Great post. I completely agree we need to use LOINC more. I think we also need to distinguish between observations and analyses of observations. For example maximum 24 hour systolic blood pressure is an analysis of two or more observations and requires very different metadata to describe it fully. It does not belong in LOINC in my humble opinion..

  2. Thank you very much for posting this blog about the use of LOIN Codes, it gives a lot of useful information for folks that want to learn more about LOINC.
    However, please let me give you more information about the challenges that face sponsor/CRO or Lab Data Managers when they have to map lab tests to either the SDTM terminology or to the LOINC:
    Selecting the appropriate LOINC for a lab test requires a lot of information from labs themselves and this amount of information is not always clear enough to enable a sponsor/CRO or even a Lab Data Manager to properly assign the right LOINC. Please keep in mind that labs develop for every test their own methodology that they change and adapt on a regular basis. Even during the course of a study, a lab can modify the methodology that can impact the LOINC assignment.
    It is really hard for Lab DMs working in labs themselves to get the appropriate LOINC for tests they have to report because LOINC does not present any interest for Labs. They have their own test codes that uniquely identify a test, a method in a predefined sample type. They do not have any interest in investing resources to duplicate their own code list (up to several hundred thousands of test codes) to another one.
    Coming back to the main topic, I do agree that LOINC should be used to report lab results because they are much more precise than LBTESTCD/LBMETHOD/LCSPEC, however the LOINC should always be provided by Labs themselves. They should never be selected by sponsors/CROs because they do not have the level of information and skills to properly assign the right LOINC for an analysis conducted by another party like a central lab.
    Even selecting LBTESTCD/LBMETHOD/LBPSECS can be really challenging for sponsors/CROs due to some uncertainty on tests performed by central lab. I do recommend insisting to get LBTESTCD from Lab DM that can provide them with more accuracy. Please keep in mind that some test analysis can be interpreted in a different manner if you are on a sponsor/CRO side or on a Lab side. I suggest testing your labs with the well-known test “Neutrophils” and check with them if the LBTESTCD that is assigned to this test, corresponds to what your labs are doing.

  3. Thank you Alexandre for these valuable comments. I completely agree that it should essentially not be up to the CRO and surely not be up to the sponsor to assign LOINC codes. This has also been recognized and described in the joint paper by FDA, CDISC and Regenstrief "Recommendations for the Submission of LOINC Codes in Regulatory Applications to the U.S. Food and Drug Administration" ( "the Working Group recommends that sponsors not attempt to derive a mapping to LOINC as they create the SDTM files". The LOINC codes should come from the labs, ideally not by mapping between local codes and LOINC codes, but from the instruments themselves. But the reality may indeed well be different. I do understand the pain of labs very well, as we have seen this in my home country Austria, where LOINC coding for lab tests has been mandated by law, in order to enable lab data to go into the (national) electronic health records each citizen has. So we observed that most (hospital) labs mapped their internal codes to LOINC. This was indeed a major effort, even though some good tools (such as RELMA) are available for free. But they all succeeded to implement LOINC.
    The FDA recently shifted the date the requirements become into action by 2 years, so allowing the industry to have more time. One of the important things seems to be that most sponsors will need to renegotiate or renew their contracts with laboratories. A model could be that labs get extra money when they provide the LOINC code with each test result. This would be well spend money as it would make the mapping work for the sponsor or it's service provider more easy.
    Even more important i.m.o. is that the protocol clearly describes which lab tests need to be performed by providing the LOINC code (or preferred LOINC codes), which would again make the life of the CRO easier when generating the SDTM datasets.
    Regarding the latter, FDA and CDISC should give sponsors the freedom to either submit the LOINC code, or submit LBTESTCD-LBSPEC-..., but not require both, which is unfortunately not recognized yet.