Introduction and background
The CDISC controlled terminology team has made a great job
in developing a mapping between the pre-coordinated LOINC terminology and
post-coordinated SDTM-LB variables. The mapping is currently in public review.
The mapping has been published as an Excel file, but can be,
with some effort, converted into a relational database table. It currently
contains approximately 2400 mappings for about 1500 LOINC codes. This already
indicates that in some cases, there is more than one mapping, e.g. when the
LOINC "system" is "Ser/Plas" which can be translated to
SDTM LBSPEC being "SERUM", "PLASMA", or "SERUM OR
PLASMA".
These mappings allow to automate the generation of LB
datasets from other sources, such as hospital HL7-v2 messages,
and especially useful, from electronic health records, as we recentlydemonstrated.
These 1400 LOINC codes surely cover a good part of the most
common tests, but what to do when one gets test results for which the LOINC
code is not among the 1400 mapped ones? Revert to manual work?
When one thinks however, that the mapping must have some
systematics (which were partially published by CDISC), we can try to extend the
mapping for cases and codes not covered by the mapping. For, example, the LOINC code "1750-9" (Albumin [Mass/Volume] in Semen) is not in the mapping, but only one of the 5
parts defining the LOINC code is different from LOINC code "1751-7" (Albumin [Mass/Volume] in Serum or Plasma), and "Semen" IS in the CDISC controlled terminology for "Specimen Type", to be used in
LBSPEC. The NCI code for "Semen" is C13277.
The most important parts for these codes would then be:
LOINC 1755-1
Albumin [Mass/Volume] in Serum or Plasma |
LOINC 1750-9
Albumin [Mass/Volume] in Semen |
|
LBTESTCD
|
ALB
|
ALB
|
LBTEST
|
Albumin
|
Albumin
|
LBSPEC
|
either "SERUM", "PLASMA" or "SERUM OR
PLASMA"
|
SEMEN
|
LBCAT
|
CHEMISTRY
|
CHEMISTRY
|
LBMETHOD
|
null
|
null
|
So, by trying to replace the value of one of the 5/6 "parts" in the LOINC "short name" (as we usually name it), by any other value for which there is a mapping available, would allow to extend the LOINC-to-SDTM-LB mapping with a good number of entries.
Methods:
Some of the systematics can easily be retrieved when one
knows and understands both LOINC and SDTM-LB, others will be harder. The easy
ones are:
- There is an almost 1:1 mapping between the LOINC "component" and the CDISC "LBTESTCD".
- There is a 1:1 relationship between SDTM "LBTESTCD" and "LBTEST".
- Except for the case "Ser/Plas", there could well be a 1:1 relationship between the LOINC "system" (the body or anatomical system) and LBSPEC.
- In principle, there should be a 1:1 relationship between the LOINC "example UCUM units" and the CDISC "example LBORRESU".
- There probably is a 1:1 relationship between the LOINC "method", and the CDISC "LBMETHOD", where it is expected that the former has much more terms
For the latter, for example, we look up the LOINC method and
CDISC LBMETHOD in the database, and order by "method", we can nicely
see how there is a close relationship between both:
These "systematics" could easily be used to
considerably extend the mappings, and even automate the process for generating
the extended dataset. One technology that could be used for this Machine
Learning (ML), but as I am not an expert at all in this area, I need something
else (P.S. the experts in ML can see this as a nice challenge). So I looked
further and came with the following algorithm:
- Iterate over all the LOINC codes in the existing, by CDISC published mapping
- For each entry, take the 5 parts from the "LOINC short name" ("method" will not very often be populated")
- Iterate over the 5 parts, and use only 4 of them to look up all similar LOINC codes in the complete LOINC database (which has over 92,000 rows) to look up all "similar" codes, which differ in only one from the parts
- This generates a good amount of new LOINC codes, all with the same values for 4 of the 5 parts, and a different value for the 5th part.
- Using the existing LOINC-to-SDTM-LB mapping, try to find a mapping for the 5th part
- If successful, generate a new mapping, and store it in a separate table that has the same structure as the LOINC-to-SDTM-LB table
For our example, extending the mapping of LOINC code
"1751-7" (Albumin [Mass/Volume] in Serum or Plasma, but for other
specimen types ("system" column in LOINC), this would result in an SQL
query:
SELECT * FROM loincdb_267.loinc WHERE component='Albumin' and property='Mcnc' and time_aspct='Pt' and scale_typ='Qn' AND system != 'Ser/Plas';
SELECT * FROM loincdb_267.loinc WHERE component='Albumin' and property='Mcnc' and time_aspct='Pt' and scale_typ='Qn' AND system != 'Ser/Plas';
This leads to 25 hits, i.e.:
For each of the "system" values, we then do a
lookup in the existing LOINC-to-SDTM-LB table, and for a good number of
"system" (specimen / anatomical system), find a mapping, i.e.:
LOINC system
|
CDISC LBSPEC
|
Urine
|
URINE
|
Amnio fld
|
AMNIOTIC FLUID
|
CSF
|
CEREBROSPINAL FLUID
|
Body fld
|
FLUID
|
Plr fld
|
PLEURAL FLUID
|
Semen
|
SEMEN
|
Synv fld
|
SYNOVIAL FLUID
|
and so on.
For each of these 25 hits, we can now easily generate a mapping, as many things are the same. We need to take a bit care for different values for "method" and thus also "LBMETHOD", and for the units, but we can take the same approach for "method" by finding a suitable LBMETHOD value in the original LOINC-to-SDTM-LB table. Similarly for the "units", where we can opt to copy the "UCUM unit" value into "example LBORRESU" when no mapping is found.
In order to later facilitate curation of the results, all
our actions are logged into a log file.
But there even is an easier scenario: of the 2400 mappings,
only about 680 have a "method" attached. In LOINC, "method"
is only provided when it is absolutely necessary to distinguish (i.e. by different
expectation values) from the usual test with no "method" mentioned. So, if for each of the already existing
mappings, we look for all LOINC codes in the complete LOINC that have the same
values for the 5 base "parts", but have a different value for the
"method", and then try to map the newly found "method" to
LBMETHOD by searching for method-LBMETHOD pairs, this could lead to a
considerably large amount of new mappings.
Six scenarios were defined, in each of them, one of the 6
"parts" was left free, i.e. we look for all LOINC codes in the LOINC
database where this part can have any value (except for the one from the
mapping entry), but all other are fixed to the one of the mapping entry. As
"method" in LOINC is not always provided, we never fixed it, but
always allowed it to vary. This also means that for each additional
"method" found, an attempt was made to map it to one of the existing
LBMETHOD values in the mapping dataset. In case nothing was found, we decided
to generate a new LBMETHOD value by "uppercasing" the value from the
LOINC method. I think this is reasonable, as the CDISC "Method"
codelist (C85492) is extensible anyway, and the value in LBMETHOD contains important
information for being able to distinguish with other tests. In such a case,
this is also documented in the log file.
When leaving the LOINC "component" part free, we
only stored a mapping when the new component did not force us to generate a new
LBTESTCD/LBTEST pair, i.e. we only store it when there is already a mapping
between "component" and LBTESTCD/LBTEST available in the original, by
CDISC developed mappings. Generating new values for LBTESTCD (which is allowed,
as the associated codelist is extensible) would have been difficult anyway, as
there is this stupid rule that LBTESTCD values may not be longer than 8
characters, and many of the LOINC "component" values are considerably
longer than 8 characters.
For the "example LBORRESU", we opted to always start from the "UCUM Unit" provided by the LOINC database, and map it, using existing entries in the mapping database. If no existing "CDISC unit" can be obtained, we copied the UCUM unit into the "example LBORRESU" and documented this in the log file.
For the "example LBORRESU", we opted to always start from the "UCUM Unit" provided by the LOINC database, and map it, using existing entries in the mapping database. If no existing "CDISC unit" can be obtained, we copied the UCUM unit into the "example LBORRESU" and documented this in the log file.
For the LOINC database, only entries for which the status is
"active" are used. This excludes entries that are
"deprecated" or "discouraged".
All this was done automatically, i.e. executed by a software
program, without manual intervention. This also means that curation for further
finetuning the mapping may be necessary.
Results
The following table contains an overview of the results,
especially indicating the number of new mappings obtained.
Scenario
Number |
Scenario Description
|
Number of
new Mappings |
1
|
Take LOINC code from CDISC mapping, fix "Component", "Property",
"Time aspect", "System" and "Scale", and look
for other entries in the LOINC database that have a different value for
"Method"
|
616
|
2
|
"Component", "Property", "System", "Scale"
are fixed. Look for other entries in the LOINC database that have a different
value for the "Time aspect"
|
204
|
3
|
"Component", "System", "Scale" and
"Time aspect" are fixed. Look for other entries in the LOINC
database that have a different value for "Property"
|
497
|
4
|
"Component", "Property", "Scale" and
"Time aspect" are fixed.
Look for entries in the LOINC database that have a different value
for "System"
|
2444
|
5
|
"Component", "Property", "Time aspect"
and "System" are fixed.
Look for entries in the LOINC database that have a different value
for "Scale"
|
22
|
6
|
"Property", "Time aspect", "System" and
"Scale" are fixed.
Look for entries in the LOINC database that have a different value for "Component" |
3056
|
Some of the the new mappings obtained starting from the LOINC code 1751-7 (Albumin [Mass/Volume] in Serum or Plasma) in the LOINC-to-SDTM-LB mapping are:
Scenario 1: "method" is different:
- LOINC 61151-7: Albumin [Mass/volume] in Serum or Plasma by Bromocresol green (BCG) dye binding method
- LOINC 61152-5: Albumin [Mass/volume] in Serum or Plasma by Bromocresol purple (BCP) dye binding method
Scenario 2: "Time aspect" is different:
no additional mappings
Scenario 3: "property" is different
- LOINC 54347-0: Albumin [Moles/volume] in Serum or Plasma
- LOINC 62234-0: Albumin [Moles/volume] in Serum or Plasma by Bromocresol purple (BCP) dye binding method
- LOINC 62235-7: Albumin [Moles/volume] in Serum or Plasma by Bromocresol green (BCG) dye binding method
Scenario 4: "System" is different:
- LOINC 1745-9: Albumin [Mass/volume] in Amniotic fluid
- LOINC 1748-3: Albumin [Mass/volume] in Pleural fluid
- LOINC 1749-1: Albumin [Mass/volume] in Peritoneal fluid
- LOINC 1750-9 []: Albumin [Mass/volume] in Semen
- LOINC 1752-5: Albumin [Mass/volume] in Synovial fluid
- LOINC 1754-1: Albumin [Mass/volume] in Urine
- LOINC 32293-3: Albumin [Mass/volume] in Unspecified specimen
- LOINC: 40599-3: Albumin [Mass/volume] in Peritoneal dialysis fluid
- LOINC 51693-0: Albumin [Mass/volume] in Pericardial fluid
- LOINC 54346-2: Albumin [Mass/volume] in Stool
- LOINC 61195-4: Albumin [Mass/volume] in Serum or Plasma from Fetus
- LOINC 61196-2: Albumin [Mass/volume] in Urine from Fetus
- LOINC 2861-3: Albumin [Mass/volume] in Cerebral spinal fluid by Electrophoresis
- LOINC 2863-9: Albumin [Mass/volume] in Synovial fluid by Electrophoresis
- LOINC 43212-0: Albumin [Mass/volume] in Body fluid by Electrophoresis
Scenario 5: "Scale" is different:
No additional mappings
Scenario 6: "Component" is different
148 new codes, e.g. LOINC 10338-2, "Barbiturates [Mass/Volume] in Serum or Plasma"
So, based on the mapping for LOINC code 1751-7 "Albumin [Mass/Volume] in Serum or Plasma", we could derive 168 new mappings. Remember that none of these mappings was present in the original dataset, as this is tested during the execution of the software, and excluded when so.
Additional work
For scenario 6, we realize that our
decision to only include "component" values for which there is
already a suitable LBTESTCD/LBTEST pair available in the original mapping, may
lead to some "missed" new mappings. For example, the LOINC code 10332-5 "Cortisol [Mass/volume] in Serum or
Plasma --pre 250 ug corticotropin IM" is rejected as the value for
"component" is "Cortisol^pre 250 ug corticotropin IM" has
no equivalent LBTESTCD/LBTEST. However, there is a mapping for the first
subpart "Cortisol", so we essentially could add it to the mappings,
if we find a way to accommodate the second part "pre 250 ug corticotropin
IM" in an LB variable. Probably, this should go into LBTPT. This is surely
something we want to look into in the near future.
The next step is that the new mappings are curated. This is
necessary as not everything can be fully automated. For example, as explained by the CDISC team in the Excel worksheet, the LOINC
"Time aspect" either maps to the CDISC variable LBTPT, or to
one of the supplemental qualifiers LBPTFL ("Point in Time Flag") or
LBPDUR ("Planned Duration"). It is (not yet) clear whether a clear
rule for this can be generated.
Another example is the assignment of LBFAST ("Fasting Flag"). Usually, this can be derived from the LOINC "Component" part having two sub-parts, delimited by the "^" character, such as in:
Another example is the assignment of LBFAST ("Fasting Flag"). Usually, this can be derived from the LOINC "Component" part having two sub-parts, delimited by the "^" character, such as in:
LOINC 14771-0 (Fasting glucose
[Moles/volume] in Serum or Plasma - Component="Glucose^post CFst"), which maps to LBTESTCD=GLUC, LBTEST=Glucose,
LBSPEC="SERUM OR PLASMA", LBFAST=Y.
It might be that generating such similar mappings can be
generated automatically, but we are not sure about that.
Furthermore, we want to exclude duplicates, except for
1:N mappings, such as for the "Ser/Plas" case, for which the original
mapping database contains 3 mappings, one for LBSPEC="PLASMA", one
for LBSPEC="SERUM" and one for LBSPEC="SERUM OR PLASMA".
And finally, we will need to repeat everything when CDISC
publishes the final LOINC-to-SDTM-LB mapping, store the result in a separate
database table, and make these additional mappings available through a RESTful
web service, as we did already for the draft LOINC-to-SDTM-LB mapping.
As these extra mappings have not been developed by CDISC, and did not underly the same quality assurance, we do not want to mix them up with the by CDISC ones, so our RESTful web service will surely have a parameter to state that also the "extended" mappings need to be searched for.
As these extra mappings have not been developed by CDISC, and did not underly the same quality assurance, we do not want to mix them up with the by CDISC ones, so our RESTful web service will surely have a parameter to state that also the "extended" mappings need to be searched for.
Something we also want to work on in future is a mapping
between the LOINC "vital sign" codes, and the CDISC-VS domain and its
variables. Background of this is that electronic health records do not use
CDISC coding for vital signs – they
use LOINC coding.